From patchwork Tue Jun 28 19:47:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898835 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEF66C43334 for ; Tue, 28 Jun 2022 19:51:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232273AbiF1Tu7 (ORCPT ); Tue, 28 Jun 2022 15:50:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229948AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FE9F3A704; Tue, 28 Jun 2022 12:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445741; x=1687981741; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pQr3fCJrk4WXxCYLImyg32mOo+swJBRtDbJmx5XeYg8=; b=Qb6+wwwKBBVgNEnIllsp5ZfJL5X/aHS/ynQdSeTc2vkfZo0G+ko16ko2 FALj+HTaU1bgc1/Aw1AqHUEJ4iWGC9eDKLHKKlrYsTK0Z0lPIIDE6wfZH zuwZ/KS87FATbzBGKBeKP53AgjRILlCltugzzaKp/APDStQNhpXSKGbkC ZXlyc7dvCIy5z2II8MnsLs+1H0uW082avgozwOJGmixmutV1u5SGRUJPq BCgNLCEdqafcN4J6+SyHz0qH8wJ0aK1Xe/6EsOFdqt8N3vz5y0QDwgGp+ z6fQevnMjuRZXp07EsFeBfD/zJQQ+X3Ec7XFTCVl2C9KDb+xKrUf/xN49 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="307319464" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="307319464" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="594927452" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2022 12:48:56 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr91022013; Tue, 28 Jun 2022 20:48:55 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 01/52] libbpf: factor out BTF loading from load_module_btfs() Date: Tue, 28 Jun 2022 21:47:21 +0200 Message-Id: <20220628194812.1453059-2-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Larysa Zaremba In order to be able to reuse BTF loading logics, move it to the new btf_load_next_with_info() and call it from load_module_btfs() instead. To still be able to get the ID, introduce the ID field to the userspace struct btf and return it via the new btf_obj_id(). To still be able to use bpf_btf_info::name as a string, locally add a counterpart to ptr_to_u64() - u64_to_ptr() and use it to filter vmlinux/module BTFs. Also, add a definition for easy bpf_btf_info name declaration and make btf_get_from_fd() static as it's now used only in btf.c. Signed-off-by: Larysa Zaremba Signed-off-by: Alexander Lobakin --- tools/lib/bpf/btf.c | 110 +++++++++++++++++++++++++++++++- tools/lib/bpf/libbpf.c | 52 ++++----------- tools/lib/bpf/libbpf_internal.h | 7 +- 3 files changed, 126 insertions(+), 43 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index ae1520f7e1b0..7e4dbf71fd52 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -121,6 +121,9 @@ struct btf { /* Pointer size (in bytes) for a target architecture of this BTF */ int ptr_sz; + + /* BTF object ID, valid for vmlinux and module BTF */ + __u32 id; }; static inline __u64 ptr_to_u64(const void *ptr) @@ -128,6 +131,11 @@ static inline __u64 ptr_to_u64(const void *ptr) return (__u64) (unsigned long) ptr; } +static inline const void *u64_to_ptr(__u64 val) +{ + return (const void *)(unsigned long)val; +} + /* Ensure given dynamically allocated memory region pointed to by *data* with * capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough * memory to accommodate *add_cnt* new elements, assuming *cur_cnt* elements @@ -463,6 +471,11 @@ const struct btf *btf__base_btf(const struct btf *btf) return btf->base_btf; } +__u32 btf_obj_id(const struct btf *btf) +{ + return btf->id; +} + /* internal helper returning non-const pointer to a type */ struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id) { @@ -819,6 +832,7 @@ static struct btf *btf_new_empty(struct btf *base_btf) btf->fd = -1; btf->ptr_sz = sizeof(void *); btf->swapped_endian = false; + btf->id = 0; if (base_btf) { btf->base_btf = base_btf; @@ -869,6 +883,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf) btf->start_id = 1; btf->start_str_off = 0; btf->fd = -1; + btf->id = 0; if (base_btf) { btf->base_btf = base_btf; @@ -1334,7 +1349,7 @@ const char *btf__name_by_offset(const struct btf *btf, __u32 offset) return btf__str_by_offset(btf, offset); } -struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf) +static struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf) { struct bpf_btf_info btf_info; __u32 len = sizeof(btf_info); @@ -1382,6 +1397,8 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf) } btf = btf_new(ptr, btf_info.btf_size, base_btf); + if (!IS_ERR_OR_NULL(btf)) + btf->id = btf_info.id; exit_free: free(ptr); @@ -4819,6 +4836,97 @@ static int btf_dedup_remap_types(struct btf_dedup *d) return 0; } +/** + * btf_load_next_with_info - get first BTF with ID bigger than the input one. + * @start_id: ID to start the search from + * @info: buffer to put BTF info to + * @base_btf: base BTF, can be %NULL if @vmlinux is true + * @vmlinux: true to look for the vmlinux BTF instead of a module BTF + * + * Obtains the first BTF with the ID bigger than the @start_id. @info::name and + * @info::name_len must be initialized by the caller. The default name buffer + * size is %BTF_NAME_BUF_LEN. + * FD must be closed after BTF is no longer needed. If @vmlinux is true, FD can + * be closed and set to -1 right away without preventing later usage. + * + * Returns pointer to the BTF loaded from the kernel or an error pointer. + */ +struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info, + struct btf *base_btf, bool vmlinux) +{ + __u32 name_len = info->name_len; + __u64 name = info->name; + const char *name_str; + __u32 id = start_id; + + if (!name) + return ERR_PTR(-EINVAL); + + name_str = u64_to_ptr(name); + + while (true) { + __u32 len = sizeof(*info); + struct btf *btf; + int err, fd; + + err = bpf_btf_get_next_id(id, &id); + if (err) { + err = -errno; + if (err != -ENOENT) + pr_warn("failed to iterate BTF objects: %d\n", + err); + return ERR_PTR(err); + } + + fd = bpf_btf_get_fd_by_id(id); + if (fd < 0) { + err = -errno; + if (err == -ENOENT) + /* Expected race: non-vmlinux BTF was + * unloaded + */ + continue; + pr_warn("failed to get BTF object #%d FD: %d\n", + id, err); + return ERR_PTR(err); + } + + memset(info, 0, len); + info->name = name; + info->name_len = name_len; + + err = bpf_obj_get_info_by_fd(fd, info, &len); + if (err) { + err = -errno; + pr_warn("failed to get BTF object #%d info: %d\n", + id, err); + goto err_out; + } + + /* Filter BTFs */ + if (!info->kernel_btf || + !strcmp(name_str, "vmlinux") != vmlinux) { + close(fd); + continue; + } + + btf = btf_get_from_fd(fd, base_btf); + err = libbpf_get_error(btf); + if (err) { + pr_warn("failed to load module [%s]'s BTF object #%d: %d\n", + name_str, id, err); + goto err_out; + } + + btf->fd = fd; + return btf; + +err_out: + close(fd); + return ERR_PTR(err); + } +} + /* * Probe few well-known locations for vmlinux kernel image and try to load BTF * data out of it to use for target BTF. diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 335467ece75f..8e27bad5e80f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -5559,11 +5559,11 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand, static int load_module_btfs(struct bpf_object *obj) { - struct bpf_btf_info info; + char name[BTF_NAME_BUF_LEN] = { }; struct module_btf *mod_btf; + struct bpf_btf_info info; struct btf *btf; - char name[64]; - __u32 id = 0, len; + __u32 id = 0; int err, fd; if (obj->btf_modules_loaded) @@ -5580,49 +5580,19 @@ static int load_module_btfs(struct bpf_object *obj) return 0; while (true) { - err = bpf_btf_get_next_id(id, &id); - if (err && errno == ENOENT) - return 0; - if (err) { - err = -errno; - pr_warn("failed to iterate BTF objects: %d\n", err); - return err; - } - - fd = bpf_btf_get_fd_by_id(id); - if (fd < 0) { - if (errno == ENOENT) - continue; /* expected race: BTF was unloaded */ - err = -errno; - pr_warn("failed to get BTF object #%d FD: %d\n", id, err); - return err; - } - - len = sizeof(info); memset(&info, 0, sizeof(info)); info.name = ptr_to_u64(name); info.name_len = sizeof(name); - err = bpf_obj_get_info_by_fd(fd, &info, &len); - if (err) { - err = -errno; - pr_warn("failed to get BTF object #%d info: %d\n", id, err); - goto err_out; - } - - /* ignore non-module BTFs */ - if (!info.kernel_btf || strcmp(name, "vmlinux") == 0) { - close(fd); - continue; - } - - btf = btf_get_from_fd(fd, obj->btf_vmlinux); + btf = btf_load_next_with_info(id, &info, obj->btf_vmlinux, + false); err = libbpf_get_error(btf); - if (err) { - pr_warn("failed to load module [%s]'s BTF object #%d: %d\n", - name, id, err); - goto err_out; - } + if (err) + return err == -ENOENT ? 0 : err; + + fd = btf__fd(btf); + btf__set_fd(btf, -1); + id = btf_obj_id(btf); err = libbpf_ensure_mem((void **)&obj->btf_modules, &obj->btf_module_cap, sizeof(*obj->btf_modules), obj->btf_module_cnt + 1); diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index a1ad145ffa74..9b0bbd4a5f64 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -366,9 +366,14 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len, const char *str_sec, size_t str_len); int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level); -struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf); void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type, const char **prefix, int *kind); +__u32 btf_obj_id(const struct btf *btf); + +#define BTF_NAME_BUF_LEN 64 + +struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info, + struct btf *base_btf, bool vmlinux); struct btf_ext_info { /* From patchwork Tue Jun 28 19:47:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898837 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C552C43334 for ; Tue, 28 Jun 2022 19:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232513AbiF1TvC (ORCPT ); Tue, 28 Jun 2022 15:51:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229772AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A8153A705; Tue, 28 Jun 2022 12:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445743; x=1687981743; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RBYELKLN8xHT64H2SZNXxit0NQ+8Jyn2q2/+cMeDt7I=; b=VXcq7UrFmFJqokctgdYD+O6+Cdnd039GuBH04uahmZ4SoakQB13qmpaj vTv0//DPThpTsvHgU9lRZwtrGk1vwc2CItz7OEscQLUtZC3LuzTpiYcgW DdS3Ud+Zh0x1gMC7GvwTAx9IKlGfinlH4cHLm5n8OUucLgSA2vynhzcu5 5cPoRvq8ZICe+UEK4OE9Qsu8Hyz1mNaLLbesRgT3OP8pwIRvUw2RXeVV5 Bzs2FrFImWtLSQcXXN2wHH8DQjdT3fHq+Es1qX5Lp369WNBAFa6jQ5pzT P2ixYqgE/N/6FtXZADeEHds4CkaKBJr9OtokGZldUPfzyb+nsFouPK6// g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568019" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568019" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="565181145" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga006.jf.intel.com with ESMTP; 28 Jun 2022 12:48:58 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr92022013; Tue, 28 Jun 2022 20:48:56 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 02/52] libbpf: try to load vmlinux BTF from the kernel first Date: Tue, 28 Jun 2022 21:47:22 +0200 Message-Id: <20220628194812.1453059-3-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Larysa Zaremba Try to acquire vmlinux BTF the same way it's being done for module BTFs. Use btf_load_next_with_info() and resort to the filesystem lookup only if it fails. Also, adjust debug messages in btf__load_vmlinux_btf() to reflect that it actually tries to load vmlinux BTF. Signed-off-by: Larysa Zaremba Signed-off-by: Alexander Lobakin --- tools/lib/bpf/btf.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 7e4dbf71fd52..8ecd50923fab 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -4927,6 +4927,25 @@ struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info, } } +static struct btf *btf_load_vmlinux_from_kernel(void) +{ + char name[BTF_NAME_BUF_LEN] = { }; + struct bpf_btf_info info; + struct btf *btf; + + memset(&info, 0, sizeof(info)); + info.name = ptr_to_u64(name); + info.name_len = sizeof(name); + + btf = btf_load_next_with_info(0, &info, NULL, true); + if (!libbpf_get_error(btf)) { + close(btf->fd); + btf__set_fd(btf, -1); + } + + return btf; +} + /* * Probe few well-known locations for vmlinux kernel image and try to load BTF * data out of it to use for target BTF. @@ -4953,6 +4972,15 @@ struct btf *btf__load_vmlinux_btf(void) struct btf *btf; int i, err; + btf = btf_load_vmlinux_from_kernel(); + err = libbpf_get_error(btf); + pr_debug("loading vmlinux BTF from kernel: %d\n", err); + if (!err) + return btf; + + pr_info("failed to load vmlinux BTF from kernel: %d, will look through filesystem\n", + err); + uname(&buf); for (i = 0; i < ARRAY_SIZE(locations); i++) { @@ -4966,14 +4994,14 @@ struct btf *btf__load_vmlinux_btf(void) else btf = btf__parse_elf(path, NULL); err = libbpf_get_error(btf); - pr_debug("loading kernel BTF '%s': %d\n", path, err); + pr_debug("loading vmlinux BTF '%s': %d\n", path, err); if (err) continue; return btf; } - pr_warn("failed to find valid kernel BTF\n"); + pr_warn("failed to find valid vmlinux BTF\n"); return libbpf_err_ptr(-ESRCH); } From patchwork Tue Jun 28 19:47:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898836 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD284CCA47F for ; Tue, 28 Jun 2022 19:51:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231650AbiF1TvA (ORCPT ); Tue, 28 Jun 2022 15:51:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232173AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DC983A707; Tue, 28 Jun 2022 12:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445744; x=1687981744; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k1Fba8eV9f/mQyTWxsBVzDTB/kS8fIdX5n6chBSOvzk=; b=PeyzOwfvIBUvtl2KqOEYIf4ibqZUcb86+o14U0P7ii9pMyxpzEE6N3bG w3bOQrzdSsdus6YzVvLnM3jOw38A0SKJ1+0KJQ1ZQ7f7yOB0HC0dJaXIR rAr5M+PxKrOpPN7tZhW2/i8uOgSULA4FNwvRiwJfRRw+3xj6iBbWm9qP3 KCI+jPfVpcKSS6oMKuxJWuuWFCdjMnYB0c81f5VO1mv8rRjWjy64kTOm6 yGOS9PiX4ZP0QvV1rUgZAsRmmG338xksedMi+y7wuEriDG0vSK81wu0IO wyCpawc5qXMLx3VMMv4KK7Gxq21kcwTx7jyWFCb3a6EZNmJ/ce4lk2O+K g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="345828284" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="345828284" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="680182418" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by FMSMGA003.fm.intel.com with ESMTP; 28 Jun 2022 12:48:59 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr93022013; Tue, 28 Jun 2022 20:48:57 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 03/52] libbpf: add function to get the pair BTF ID + type ID for a given type Date: Tue, 28 Jun 2022 21:47:23 +0200 Message-Id: <20220628194812.1453059-4-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add new libbpf API function libbpf_get_type_btf_id() to provide a short way to get the pair of BTF ID << 32 | type ID for the provided type. The primary purpose is to use it in userspace BPF prog loaders to pass those IDs to the kernel to tell what XDP generic metadata to create, as well as in AF_XDP programs to be able to compare them against the ones from frame metadata. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/libbpf.c | 113 +++++++++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.h | 1 + tools/lib/bpf/libbpf.map | 1 + 3 files changed, 115 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 8e27bad5e80f..9bda111c8167 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2252,6 +2252,28 @@ const char *btf_kind_str(const struct btf_type *t) return __btf_kind_str(btf_kind(t)); } +static __u32 btf_kind_from_str(const char **type) +{ + const char *pos, *orig = *type; + __u32 kind; + int len; + + pos = strchr(orig, ' '); + if (pos) { + len = pos - orig; + *type = pos + 1; + } else { + len = strlen(orig); + } + + for (kind = BTF_KIND_UNKN; kind < NR_BTF_KINDS; kind++) { + if (!strncmp(orig, __btf_kind_str(kind), len)) + break; + } + + return kind < NR_BTF_KINDS ? kind : BTF_KIND_UNKN; +} + /* * Fetch integer attribute of BTF map definition. Such attributes are * represented using a pointer to an array, in which dimensionality of array @@ -9617,6 +9639,97 @@ int libbpf_find_vmlinux_btf_id(const char *name, return libbpf_err(err); } +static __s32 libbpf_find_btf_id(const char *type, __u32 kind, + struct btf **res_btf) +{ + char name[BTF_NAME_BUF_LEN] = { }; + struct btf *vmlinux_btf, *btf; + struct bpf_btf_info info; + __u32 id = 0; + __s32 ret; + + if (res_btf) + *res_btf = NULL; + + if (!type || !*type) + return -EINVAL; + + vmlinux_btf = btf__load_vmlinux_btf(); + ret = libbpf_get_error(vmlinux_btf); + if (ret < 0) + goto free_vmlinux; + + ret = btf__find_by_name_kind(vmlinux_btf, type, kind); + if (ret > 0) { + btf = vmlinux_btf; + goto out; + } + + while (true) { + memset(&info, 0, sizeof(info)); + info.name = ptr_to_u64(name); + info.name_len = sizeof(name); + + btf = btf_load_next_with_info(id, &info, vmlinux_btf, false); + ret = libbpf_get_error(btf); + if (ret) + break; + + ret = btf__find_by_name_kind(btf, type, kind); + if (ret > 0) + break; + + id = btf_obj_id(btf); + btf__free(btf); + } + +free_vmlinux: + btf__free(vmlinux_btf); + +out: + if (ret > 0 && res_btf) + *res_btf = btf; + + return ret ? : -ESRCH; +} + +/** + * libbpf_get_type_btf_id - get the pair BTF ID + type ID for a given type + * @type: pointer to the name of the type to look for + * @res_id: pointer to write the result to + * + * Tries to find the BTF corresponding to the provided type (full string) and + * write the pair of BTF ID << 32 | type ID. Such coded __u64 are being used + * in XDP generic-compatible metadata to distinguish between different + * metadata structures. + * @res_id can be %NULL to only check if a particular type exists within + * the BTF. + * + * Returns 0 in case of success, -errno otherwise. + */ +int libbpf_get_type_btf_id(const char *type, __u64 *res_id) +{ + struct btf *btf = NULL; + __s32 type_id; + __u32 kind; + + if (res_id) + *res_id = 0; + + if (!type || !*type) + return libbpf_err(-EINVAL); + + kind = btf_kind_from_str(&type); + + type_id = libbpf_find_btf_id(type, kind, &btf); + if (type_id > 0 && res_id) + *res_id = ((__u64)btf_obj_id(btf) << 32) | type_id; + + btf__free(btf); + + return libbpf_err(min(type_id, 0)); +} + static int libbpf_find_prog_btf_id(const char *name, __u32 attach_prog_fd) { struct bpf_prog_info info = {}; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index fa27969da0da..4056e9038086 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -295,6 +295,7 @@ LIBBPF_API int libbpf_attach_type_by_name(const char *name, enum bpf_attach_type *attach_type); LIBBPF_API int libbpf_find_vmlinux_btf_id(const char *name, enum bpf_attach_type attach_type); +LIBBPF_API int libbpf_get_type_btf_id(const char *type, __u64 *id); /* Accessors of bpf_program */ struct bpf_program; diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 116a2a8ee7c2..f0987df15b7a 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -468,6 +468,7 @@ LIBBPF_1.0.0 { libbpf_bpf_link_type_str; libbpf_bpf_map_type_str; libbpf_bpf_prog_type_str; + libbpf_get_type_btf_id; local: *; }; From patchwork Tue Jun 28 19:47:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898838 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A543FC43334 for ; Tue, 28 Jun 2022 19:51:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232529AbiF1TvE (ORCPT ); Tue, 28 Jun 2022 15:51:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231736AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AE853A70C; Tue, 28 Jun 2022 12:49:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445746; x=1687981746; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sgjOMZvqwy3/sgeJmIw/5jGoi4GaZrbkmWa1wRm6tdY=; b=AONLCuT4qgIwK4zUm6RFmNB5ns3q/SZHwV36K52ChsFU04ak0jh7N8qI wc8gH6MuqT4Zok5oDPKkvut+ZrEHrhKZex0d8NqG+uI7ryPxSi6+lo8AR UrCYt5lfvDHn6Z4vdivr5L9VNNztk7s15iiRxMSV+2ENezs3kowd6DWa2 h4eTG9QhqBXp5ad9+C5zdbgr/OXlDTvdoL6w/Oeu96B8Z/RxltyUkYqg0 CX7fkerH8e+AKQGQcBlgPcfnYX9uS8ng9chm98+IgGQ4C+j75bUH9l9Lk rqIv+v/8r0D0cfdZjg6GSakNKaKMXbwiTfHsN0oa3ZaFIDpFhPBEl/Kql g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="345828299" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="345828299" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809320" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:00 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr94022013; Tue, 28 Jun 2022 20:48:59 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 04/52] libbpf: patch module BTF ID into BPF insns Date: Tue, 28 Jun 2022 21:47:24 +0200 Message-Id: <20220628194812.1453059-5-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Larysa Zaremba Return both type id and BTF id from bpf_core_type_id_kernel(). Earlier only type id was returned despite the fact that llvm has enabled the 64 return type for this instruction [1]. This was done as a preparation to the patch [2], which also strongly served as a inspiration for this implementation. [1] https://reviews.llvm.org/D91489 [2] https://lore.kernel.org/all/20201205025140.443115-1-andrii@kernel.org Signed-off-by: Larysa Zaremba Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf_core_read.h | 3 ++- tools/lib/bpf/relo_core.c | 8 +++++++- tools/lib/bpf/relo_core.h | 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h index fd48b1ff59ca..2b7d675b2dd0 100644 --- a/tools/lib/bpf/bpf_core_read.h +++ b/tools/lib/bpf/bpf_core_read.h @@ -167,7 +167,8 @@ enum bpf_enum_value_kind { * Convenience macro to get BTF type ID of a target kernel's type that matches * specified local type. * Returns: - * - valid 32-bit unsigned type ID in kernel BTF; + * - valid 64-bit unsigned integer: the upper 32 bits is the BTF ID + * and the lower 32 bits is the type ID within the BTF. * - 0, if no matching type was found in a target kernel BTF. */ #define bpf_core_type_id_kernel(type) \ diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c index e070123332cd..020f0f81374c 100644 --- a/tools/lib/bpf/relo_core.c +++ b/tools/lib/bpf/relo_core.c @@ -884,6 +884,7 @@ static int bpf_core_calc_relo(const char *prog_name, res->fail_memsz_adjust = false; res->orig_sz = res->new_sz = 0; res->orig_type_id = res->new_type_id = 0; + res->btf_obj_id = 0; if (core_relo_is_field_based(relo->kind)) { err = bpf_core_calc_field_relo(prog_name, relo, local_spec, @@ -934,6 +935,8 @@ static int bpf_core_calc_relo(const char *prog_name, } else if (core_relo_is_type_based(relo->kind)) { err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val, &res->validate); err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val, NULL); + if (!err && relo->kind == BPF_CORE_TYPE_ID_TARGET) + res->btf_obj_id = btf_obj_id(targ_spec->btf); } else if (core_relo_is_enumval_based(relo->kind)) { err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val); err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val); @@ -1125,7 +1128,10 @@ int bpf_core_patch_insn(const char *prog_name, struct bpf_insn *insn, } insn[0].imm = new_val; - insn[1].imm = new_val >> 32; + /* For type IDs, upper 32 bits are used for BTF ID */ + insn[1].imm = relo->kind == BPF_CORE_TYPE_ID_TARGET ? + res->btf_obj_id : + (new_val >> 32); pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %llu\n", prog_name, relo_idx, insn_idx, (unsigned long long)imm, (unsigned long long)new_val); diff --git a/tools/lib/bpf/relo_core.h b/tools/lib/bpf/relo_core.h index 3fd3842d4230..f026ea36140e 100644 --- a/tools/lib/bpf/relo_core.h +++ b/tools/lib/bpf/relo_core.h @@ -66,6 +66,7 @@ struct bpf_core_relo_res { __u32 orig_type_id; __u32 new_sz; __u32 new_type_id; + __u32 btf_obj_id; }; int __bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, From patchwork Tue Jun 28 19:47:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898841 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39540CCA479 for ; Tue, 28 Jun 2022 19:51:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232608AbiF1TvH (ORCPT ); Tue, 28 Jun 2022 15:51:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231261AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C79D3A70D; Tue, 28 Jun 2022 12:49:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445748; x=1687981748; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VL3J+vONz0M3On4/kTP4hoS53bYjpYTOUOqGD1HRah8=; b=hMw9UQiqeWn6hD0H+8eF9kgMBJRCtSM0LWO/KBLtUr7tsWRY0/ywstFt aYxQ8u/ed5Lwvve9fhGmEgTUrcfo+Fpt5gNLEdxJs232eilO0QSfkEGgn kHuhVRVCkkkHEw+QecyMOj4wgwBV3QPzGtPR3acbW1P7ReRh1JKPshEhn WGeA98QCmluKyCeQ7ytDcFqybodKrR/oeaAmP9ZvTzm1+5cPa6gSlJDyk rG4GTxpzuj0YDu8c3UOE+I11zpUC3cWzB07FdfksRJUi/VIzIdUsIRTmn mpyZexXt64S1NJG4HNI3DTOht/1+0RUMjgTNluau4oO5xqWdJFgnYvUez Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568045" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568045" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="587988484" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga007.jf.intel.com with ESMTP; 28 Jun 2022 12:49:02 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr95022013; Tue, 28 Jun 2022 20:49:00 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 05/52] net, xdp: decouple XDP code from the core networking code Date: Tue, 28 Jun 2022 21:47:25 +0200 Message-Id: <20220628194812.1453059-6-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Currently, there are a couple of rather big pieces of purely XDP code residing in `net/core/dev.c` and `net/core/filter.c`, and they won't get smaller any time soon. To make it more scalable, move them to the new separate files inside `net/bpf/`, which is almost empty now, along with `net/core/xdp.c`. This goes so well so that we only had to make 3 functions global which were static previously (+1 static key). The only mentions of XDP left in `filter.c` are helpers which share code with the skb variants and it would cost much more to make the shared code global instead. Signed-off-by: Alexander Lobakin --- MAINTAINERS | 4 +- include/linux/filter.h | 2 + include/linux/netdevice.h | 5 + net/bpf/Makefile | 5 +- net/{core/xdp.c => bpf/core.c} | 2 +- net/bpf/dev.c | 776 ++++++++++++++++++++++++++++ net/bpf/prog_ops.c | 911 +++++++++++++++++++++++++++++++++ net/core/Makefile | 2 +- net/core/dev.c | 771 ---------------------------- net/core/dev.h | 4 - net/core/filter.c | 883 +------------------------------- 11 files changed, 1705 insertions(+), 1660 deletions(-) rename net/{core/xdp.c => bpf/core.c} (99%) create mode 100644 net/bpf/dev.c create mode 100644 net/bpf/prog_ops.c diff --git a/MAINTAINERS b/MAINTAINERS index ca95b1833b97..91190e12a157 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21726,7 +21726,9 @@ F: include/net/xdp_priv.h F: include/trace/events/xdp.h F: kernel/bpf/cpumap.c F: kernel/bpf/devmap.c -F: net/core/xdp.c +F: net/bpf/core.c +F: net/bpf/dev.c +F: net/bpf/prog_ops.c F: samples/bpf/xdp* F: tools/testing/selftests/bpf/*xdp* F: tools/testing/selftests/bpf/*/*xdp* diff --git a/include/linux/filter.h b/include/linux/filter.h index 4c1a8b247545..360e60a425ad 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -992,6 +992,8 @@ void xdp_do_flush(void); #define xdp_do_flush_map xdp_do_flush void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act); +const struct bpf_func_proto *xdp_inet_func_proto(enum bpf_func_id func_id); +bool xdp_helper_changes_pkt_data(const void *func); #ifdef CONFIG_INET struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 89afa4f7747d..0b8169c23f22 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3848,7 +3848,12 @@ struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *d struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); +DECLARE_STATIC_KEY_FALSE(generic_xdp_needed_key); + int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, + int fd, int expected_fd, u32 flags); +void dev_xdp_uninstall(struct net_device *dev); u8 dev_xdp_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); diff --git a/net/bpf/Makefile b/net/bpf/Makefile index 1ebe270bde23..715550f9048b 100644 --- a/net/bpf/Makefile +++ b/net/bpf/Makefile @@ -1,5 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_BPF_SYSCALL) := test_run.o + +obj-y := core.o dev.o prog_ops.o + +obj-$(CONFIG_BPF_SYSCALL) += test_run.o ifeq ($(CONFIG_BPF_JIT),y) obj-$(CONFIG_BPF_SYSCALL) += bpf_dummy_struct_ops.o endif diff --git a/net/core/xdp.c b/net/bpf/core.c similarity index 99% rename from net/core/xdp.c rename to net/bpf/core.c index 24420209bf0e..fbb72792320a 100644 --- a/net/core/xdp.c +++ b/net/bpf/core.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only -/* net/core/xdp.c +/* net/bpf/core.c * * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. */ diff --git a/net/bpf/dev.c b/net/bpf/dev.c new file mode 100644 index 000000000000..dfe0402947f8 --- /dev/null +++ b/net/bpf/dev.c @@ -0,0 +1,776 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include + +DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); + +static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct netdev_rx_queue *rxqueue; + + rxqueue = dev->_rx; + + if (skb_rx_queue_recorded(skb)) { + u16 index = skb_get_rx_queue(skb); + + if (unlikely(index >= dev->real_num_rx_queues)) { + WARN_ONCE(dev->real_num_rx_queues > 1, + "%s received packet on queue %u, but number " + "of RX queues is %u\n", + dev->name, index, dev->real_num_rx_queues); + + return rxqueue; /* Return first rxqueue */ + } + rxqueue += index; + } + return rxqueue; +} + +u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + void *orig_data, *orig_data_end, *hard_start; + struct netdev_rx_queue *rxqueue; + bool orig_bcast, orig_host; + u32 mac_len, frame_sz; + __be16 orig_eth_type; + struct ethhdr *eth; + u32 metalen, act; + int off; + + /* The XDP program wants to see the packet starting at the MAC + * header. + */ + mac_len = skb->data - skb_mac_header(skb); + hard_start = skb->data - skb_headroom(skb); + + /* SKB "head" area always have tailroom for skb_shared_info */ + frame_sz = (void *)skb_end_pointer(skb) - hard_start; + frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + + rxqueue = netif_get_rxqueue(skb); + xdp_init_buff(xdp, frame_sz, &rxqueue->xdp_rxq); + xdp_prepare_buff(xdp, hard_start, skb_headroom(skb) - mac_len, + skb_headlen(skb) + mac_len, true); + + orig_data_end = xdp->data_end; + orig_data = xdp->data; + eth = (struct ethhdr *)xdp->data; + orig_host = ether_addr_equal_64bits(eth->h_dest, skb->dev->dev_addr); + orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest); + orig_eth_type = eth->h_proto; + + act = bpf_prog_run_xdp(xdp_prog, xdp); + + /* check if bpf_xdp_adjust_head was used */ + off = xdp->data - orig_data; + if (off) { + if (off > 0) + __skb_pull(skb, off); + else if (off < 0) + __skb_push(skb, -off); + + skb->mac_header += off; + skb_reset_network_header(skb); + } + + /* check if bpf_xdp_adjust_tail was used */ + off = xdp->data_end - orig_data_end; + if (off != 0) { + skb_set_tail_pointer(skb, xdp->data_end - xdp->data); + skb->len += off; /* positive on grow, negative on shrink */ + } + + /* check if XDP changed eth hdr such SKB needs update */ + eth = (struct ethhdr *)xdp->data; + if ((orig_eth_type != eth->h_proto) || + (orig_host != ether_addr_equal_64bits(eth->h_dest, + skb->dev->dev_addr)) || + (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) { + __skb_push(skb, ETH_HLEN); + skb->pkt_type = PACKET_HOST; + skb->protocol = eth_type_trans(skb, skb->dev); + } + + /* Redirect/Tx gives L2 packet, code that will reuse skb must __skb_pull + * before calling us again on redirect path. We do not call do_redirect + * as we leave that up to the caller. + * + * Caller is responsible for managing lifetime of skb (i.e. calling + * kfree_skb in response to actions it cannot handle/XDP_DROP). + */ + switch (act) { + case XDP_REDIRECT: + case XDP_TX: + __skb_push(skb, mac_len); + break; + case XDP_PASS: + metalen = xdp->data - xdp->data_meta; + if (metalen) + skb_metadata_set(skb, metalen); + break; + } + + return act; +} + +static u32 netif_receive_generic_xdp(struct sk_buff *skb, + struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + u32 act = XDP_DROP; + + /* Reinjected packets coming from act_mirred or similar should + * not get XDP generic processing. + */ + if (skb_is_redirected(skb)) + return XDP_PASS; + + /* XDP packets must be linear and must have sufficient headroom + * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also + * native XDP provides, thus we need to do it here as well. + */ + if (skb_cloned(skb) || skb_is_nonlinear(skb) || + skb_headroom(skb) < XDP_PACKET_HEADROOM) { + int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); + int troom = skb->tail + skb->data_len - skb->end; + + /* In case we have to go down the path and also linearize, + * then lets do the pskb_expand_head() work just once here. + */ + if (pskb_expand_head(skb, + hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, + troom > 0 ? troom + 128 : 0, GFP_ATOMIC)) + goto do_drop; + if (skb_linearize(skb)) + goto do_drop; + } + + act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog); + switch (act) { + case XDP_REDIRECT: + case XDP_TX: + case XDP_PASS: + break; + default: + bpf_warn_invalid_xdp_action(skb->dev, xdp_prog, act); + fallthrough; + case XDP_ABORTED: + trace_xdp_exception(skb->dev, xdp_prog, act); + fallthrough; + case XDP_DROP: + do_drop: + kfree_skb(skb); + break; + } + + return act; +} + +/* When doing generic XDP we have to bypass the qdisc layer and the + * network taps in order to match in-driver-XDP behavior. + */ +void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) +{ + struct net_device *dev = skb->dev; + struct netdev_queue *txq; + bool free_skb = true; + int cpu, rc; + + txq = netdev_core_pick_tx(dev, skb, NULL); + cpu = smp_processor_id(); + HARD_TX_LOCK(dev, txq, cpu); + if (!netif_xmit_stopped(txq)) { + rc = netdev_start_xmit(skb, dev, txq, 0); + if (dev_xmit_complete(rc)) + free_skb = false; + } + HARD_TX_UNLOCK(dev, txq); + if (free_skb) { + trace_xdp_exception(dev, xdp_prog, XDP_TX); + kfree_skb(skb); + } +} + +int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) +{ + if (xdp_prog) { + struct xdp_buff xdp; + u32 act; + int err; + + act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); + if (act != XDP_PASS) { + switch (act) { + case XDP_REDIRECT: + err = xdp_do_generic_redirect(skb->dev, skb, + &xdp, xdp_prog); + if (err) + goto out_redir; + break; + case XDP_TX: + generic_xdp_tx(skb, xdp_prog); + break; + } + return XDP_DROP; + } + } + return XDP_PASS; +out_redir: + kfree_skb_reason(skb, SKB_DROP_REASON_XDP); + return XDP_DROP; +} +EXPORT_SYMBOL_GPL(do_xdp_generic); + +/** + * dev_disable_gro_hw - disable HW Generic Receive Offload on a device + * @dev: device + * + * Disable HW Generic Receive Offload (GRO_HW) on a net device. Must be + * called under RTNL. This is needed if Generic XDP is installed on + * the device. + */ +static void dev_disable_gro_hw(struct net_device *dev) +{ + dev->wanted_features &= ~NETIF_F_GRO_HW; + netdev_update_features(dev); + + if (unlikely(dev->features & NETIF_F_GRO_HW)) + netdev_WARN(dev, "failed to disable GRO_HW!\n"); +} + +static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) +{ + struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); + struct bpf_prog *new = xdp->prog; + int ret = 0; + + switch (xdp->command) { + case XDP_SETUP_PROG: + rcu_assign_pointer(dev->xdp_prog, new); + if (old) + bpf_prog_put(old); + + if (old && !new) { + static_branch_dec(&generic_xdp_needed_key); + } else if (new && !old) { + static_branch_inc(&generic_xdp_needed_key); + dev_disable_lro(dev); + dev_disable_gro_hw(dev); + } + break; + + default: + ret = -EINVAL; + break; + } + + return ret; +} + +struct bpf_xdp_link { + struct bpf_link link; + struct net_device *dev; /* protected by rtnl_lock, no refcnt held */ + int flags; +}; + +typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); + +static enum bpf_xdp_mode dev_xdp_mode(struct net_device *dev, u32 flags) +{ + if (flags & XDP_FLAGS_HW_MODE) + return XDP_MODE_HW; + if (flags & XDP_FLAGS_DRV_MODE) + return XDP_MODE_DRV; + if (flags & XDP_FLAGS_SKB_MODE) + return XDP_MODE_SKB; + return dev->netdev_ops->ndo_bpf ? XDP_MODE_DRV : XDP_MODE_SKB; +} + +static bpf_op_t dev_xdp_bpf_op(struct net_device *dev, enum bpf_xdp_mode mode) +{ + switch (mode) { + case XDP_MODE_SKB: + return generic_xdp_install; + case XDP_MODE_DRV: + case XDP_MODE_HW: + return dev->netdev_ops->ndo_bpf; + default: + return NULL; + } +} + +static struct bpf_xdp_link *dev_xdp_link(struct net_device *dev, + enum bpf_xdp_mode mode) +{ + return dev->xdp_state[mode].link; +} + +static struct bpf_prog *dev_xdp_prog(struct net_device *dev, + enum bpf_xdp_mode mode) +{ + struct bpf_xdp_link *link = dev_xdp_link(dev, mode); + + if (link) + return link->link.prog; + return dev->xdp_state[mode].prog; +} + +u8 dev_xdp_prog_count(struct net_device *dev) +{ + u8 count = 0; + int i; + + for (i = 0; i < __MAX_XDP_MODE; i++) + if (dev->xdp_state[i].prog || dev->xdp_state[i].link) + count++; + return count; +} +EXPORT_SYMBOL_GPL(dev_xdp_prog_count); + +u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode) +{ + struct bpf_prog *prog = dev_xdp_prog(dev, mode); + + return prog ? prog->aux->id : 0; +} + +static void dev_xdp_set_link(struct net_device *dev, enum bpf_xdp_mode mode, + struct bpf_xdp_link *link) +{ + dev->xdp_state[mode].link = link; + dev->xdp_state[mode].prog = NULL; +} + +static void dev_xdp_set_prog(struct net_device *dev, enum bpf_xdp_mode mode, + struct bpf_prog *prog) +{ + dev->xdp_state[mode].link = NULL; + dev->xdp_state[mode].prog = prog; +} + +static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, + bpf_op_t bpf_op, struct netlink_ext_ack *extack, + u32 flags, struct bpf_prog *prog) +{ + struct netdev_bpf xdp; + int err; + + memset(&xdp, 0, sizeof(xdp)); + xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW : XDP_SETUP_PROG; + xdp.extack = extack; + xdp.flags = flags; + xdp.prog = prog; + + /* Drivers assume refcnt is already incremented (i.e, prog pointer is + * "moved" into driver), so they don't increment it on their own, but + * they do decrement refcnt when program is detached or replaced. + * Given net_device also owns link/prog, we need to bump refcnt here + * to prevent drivers from underflowing it. + */ + if (prog) + bpf_prog_inc(prog); + err = bpf_op(dev, &xdp); + if (err) { + if (prog) + bpf_prog_put(prog); + return err; + } + + if (mode != XDP_MODE_HW) + bpf_prog_change_xdp(dev_xdp_prog(dev, mode), prog); + + return 0; +} + +void dev_xdp_uninstall(struct net_device *dev) +{ + struct bpf_xdp_link *link; + struct bpf_prog *prog; + enum bpf_xdp_mode mode; + bpf_op_t bpf_op; + + ASSERT_RTNL(); + + for (mode = XDP_MODE_SKB; mode < __MAX_XDP_MODE; mode++) { + prog = dev_xdp_prog(dev, mode); + if (!prog) + continue; + + bpf_op = dev_xdp_bpf_op(dev, mode); + if (!bpf_op) + continue; + + WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); + + /* auto-detach link from net device */ + link = dev_xdp_link(dev, mode); + if (link) + link->dev = NULL; + else + bpf_prog_put(prog); + + dev_xdp_set_link(dev, mode, NULL); + } +} + +static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack, + struct bpf_xdp_link *link, struct bpf_prog *new_prog, + struct bpf_prog *old_prog, u32 flags) +{ + unsigned int num_modes = hweight32(flags & XDP_FLAGS_MODES); + struct bpf_prog *cur_prog; + struct net_device *upper; + struct list_head *iter; + enum bpf_xdp_mode mode; + bpf_op_t bpf_op; + int err; + + ASSERT_RTNL(); + + /* either link or prog attachment, never both */ + if (link && (new_prog || old_prog)) + return -EINVAL; + /* link supports only XDP mode flags */ + if (link && (flags & ~XDP_FLAGS_MODES)) { + NL_SET_ERR_MSG(extack, "Invalid XDP flags for BPF link attachment"); + return -EINVAL; + } + /* just one XDP mode bit should be set, zero defaults to drv/skb mode */ + if (num_modes > 1) { + NL_SET_ERR_MSG(extack, "Only one XDP mode flag can be set"); + return -EINVAL; + } + /* avoid ambiguity if offload + drv/skb mode progs are both loaded */ + if (!num_modes && dev_xdp_prog_count(dev) > 1) { + NL_SET_ERR_MSG(extack, + "More than one program loaded, unset mode is ambiguous"); + return -EINVAL; + } + /* old_prog != NULL implies XDP_FLAGS_REPLACE is set */ + if (old_prog && !(flags & XDP_FLAGS_REPLACE)) { + NL_SET_ERR_MSG(extack, "XDP_FLAGS_REPLACE is not specified"); + return -EINVAL; + } + + mode = dev_xdp_mode(dev, flags); + /* can't replace attached link */ + if (dev_xdp_link(dev, mode)) { + NL_SET_ERR_MSG(extack, "Can't replace active BPF XDP link"); + return -EBUSY; + } + + /* don't allow if an upper device already has a program */ + netdev_for_each_upper_dev_rcu(dev, upper, iter) { + if (dev_xdp_prog_count(upper) > 0) { + NL_SET_ERR_MSG(extack, "Cannot attach when an upper device already has a program"); + return -EEXIST; + } + } + + cur_prog = dev_xdp_prog(dev, mode); + /* can't replace attached prog with link */ + if (link && cur_prog) { + NL_SET_ERR_MSG(extack, "Can't replace active XDP program with BPF link"); + return -EBUSY; + } + if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) { + NL_SET_ERR_MSG(extack, "Active program does not match expected"); + return -EEXIST; + } + + /* put effective new program into new_prog */ + if (link) + new_prog = link->link.prog; + + if (new_prog) { + bool offload = mode == XDP_MODE_HW; + enum bpf_xdp_mode other_mode = mode == XDP_MODE_SKB + ? XDP_MODE_DRV : XDP_MODE_SKB; + + if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && cur_prog) { + NL_SET_ERR_MSG(extack, "XDP program already attached"); + return -EBUSY; + } + if (!offload && dev_xdp_prog(dev, other_mode)) { + NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time"); + return -EEXIST; + } + if (!offload && bpf_prog_is_dev_bound(new_prog->aux)) { + NL_SET_ERR_MSG(extack, "Using device-bound program without HW_MODE flag is not supported"); + return -EINVAL; + } + if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) { + NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device"); + return -EINVAL; + } + if (new_prog->expected_attach_type == BPF_XDP_CPUMAP) { + NL_SET_ERR_MSG(extack, "BPF_XDP_CPUMAP programs can not be attached to a device"); + return -EINVAL; + } + } + + /* don't call drivers if the effective program didn't change */ + if (new_prog != cur_prog) { + bpf_op = dev_xdp_bpf_op(dev, mode); + if (!bpf_op) { + NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode"); + return -EOPNOTSUPP; + } + + err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog); + if (err) + return err; + } + + if (link) + dev_xdp_set_link(dev, mode, link); + else + dev_xdp_set_prog(dev, mode, new_prog); + if (cur_prog) + bpf_prog_put(cur_prog); + + return 0; +} + +static int dev_xdp_attach_link(struct net_device *dev, + struct netlink_ext_ack *extack, + struct bpf_xdp_link *link) +{ + return dev_xdp_attach(dev, extack, link, NULL, NULL, link->flags); +} + +static int dev_xdp_detach_link(struct net_device *dev, + struct netlink_ext_ack *extack, + struct bpf_xdp_link *link) +{ + enum bpf_xdp_mode mode; + bpf_op_t bpf_op; + + ASSERT_RTNL(); + + mode = dev_xdp_mode(dev, link->flags); + if (dev_xdp_link(dev, mode) != link) + return -EINVAL; + + bpf_op = dev_xdp_bpf_op(dev, mode); + WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); + dev_xdp_set_link(dev, mode, NULL); + return 0; +} + +static void bpf_xdp_link_release(struct bpf_link *link) +{ + struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + + rtnl_lock(); + + /* if racing with net_device's tear down, xdp_link->dev might be + * already NULL, in which case link was already auto-detached + */ + if (xdp_link->dev) { + WARN_ON(dev_xdp_detach_link(xdp_link->dev, NULL, xdp_link)); + xdp_link->dev = NULL; + } + + rtnl_unlock(); +} + +static int bpf_xdp_link_detach(struct bpf_link *link) +{ + bpf_xdp_link_release(link); + return 0; +} + +static void bpf_xdp_link_dealloc(struct bpf_link *link) +{ + struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + + kfree(xdp_link); +} + +static void bpf_xdp_link_show_fdinfo(const struct bpf_link *link, + struct seq_file *seq) +{ + struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + u32 ifindex = 0; + + rtnl_lock(); + if (xdp_link->dev) + ifindex = xdp_link->dev->ifindex; + rtnl_unlock(); + + seq_printf(seq, "ifindex:\t%u\n", ifindex); +} + +static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, + struct bpf_link_info *info) +{ + struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + u32 ifindex = 0; + + rtnl_lock(); + if (xdp_link->dev) + ifindex = xdp_link->dev->ifindex; + rtnl_unlock(); + + info->xdp.ifindex = ifindex; + return 0; +} + +static int bpf_xdp_link_update(struct bpf_link *link, struct bpf_prog *new_prog, + struct bpf_prog *old_prog) +{ + struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + enum bpf_xdp_mode mode; + bpf_op_t bpf_op; + int err = 0; + + rtnl_lock(); + + /* link might have been auto-released already, so fail */ + if (!xdp_link->dev) { + err = -ENOLINK; + goto out_unlock; + } + + if (old_prog && link->prog != old_prog) { + err = -EPERM; + goto out_unlock; + } + old_prog = link->prog; + if (old_prog->type != new_prog->type || + old_prog->expected_attach_type != new_prog->expected_attach_type) { + err = -EINVAL; + goto out_unlock; + } + + if (old_prog == new_prog) { + /* no-op, don't disturb drivers */ + bpf_prog_put(new_prog); + goto out_unlock; + } + + mode = dev_xdp_mode(xdp_link->dev, xdp_link->flags); + bpf_op = dev_xdp_bpf_op(xdp_link->dev, mode); + err = dev_xdp_install(xdp_link->dev, mode, bpf_op, NULL, + xdp_link->flags, new_prog); + if (err) + goto out_unlock; + + old_prog = xchg(&link->prog, new_prog); + bpf_prog_put(old_prog); + +out_unlock: + rtnl_unlock(); + return err; +} + +static const struct bpf_link_ops bpf_xdp_link_lops = { + .release = bpf_xdp_link_release, + .dealloc = bpf_xdp_link_dealloc, + .detach = bpf_xdp_link_detach, + .show_fdinfo = bpf_xdp_link_show_fdinfo, + .fill_link_info = bpf_xdp_link_fill_link_info, + .update_prog = bpf_xdp_link_update, +}; + +int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct net *net = current->nsproxy->net_ns; + struct bpf_link_primer link_primer; + struct bpf_xdp_link *link; + struct net_device *dev; + int err, fd; + + rtnl_lock(); + dev = dev_get_by_index(net, attr->link_create.target_ifindex); + if (!dev) { + rtnl_unlock(); + return -EINVAL; + } + + link = kzalloc(sizeof(*link), GFP_USER); + if (!link) { + err = -ENOMEM; + goto unlock; + } + + bpf_link_init(&link->link, BPF_LINK_TYPE_XDP, &bpf_xdp_link_lops, prog); + link->dev = dev; + link->flags = attr->link_create.flags; + + err = bpf_link_prime(&link->link, &link_primer); + if (err) { + kfree(link); + goto unlock; + } + + err = dev_xdp_attach_link(dev, NULL, link); + rtnl_unlock(); + + if (err) { + link->dev = NULL; + bpf_link_cleanup(&link_primer); + goto out_put_dev; + } + + fd = bpf_link_settle(&link_primer); + /* link itself doesn't hold dev's refcnt to not complicate shutdown */ + dev_put(dev); + return fd; + +unlock: + rtnl_unlock(); + +out_put_dev: + dev_put(dev); + return err; +} + +/** + * dev_change_xdp_fd - set or clear a bpf program for a device rx path + * @dev: device + * @extack: netlink extended ack + * @fd: new program fd or negative value to clear + * @expected_fd: old program fd that userspace expects to replace or clear + * @flags: xdp-related flags + * + * Set or clear a bpf program for a device + */ +int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, + int fd, int expected_fd, u32 flags) +{ + enum bpf_xdp_mode mode = dev_xdp_mode(dev, flags); + struct bpf_prog *new_prog = NULL, *old_prog = NULL; + int err; + + ASSERT_RTNL(); + + if (fd >= 0) { + new_prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, + mode != XDP_MODE_SKB); + if (IS_ERR(new_prog)) + return PTR_ERR(new_prog); + } + + if (expected_fd >= 0) { + old_prog = bpf_prog_get_type_dev(expected_fd, BPF_PROG_TYPE_XDP, + mode != XDP_MODE_SKB); + if (IS_ERR(old_prog)) { + err = PTR_ERR(old_prog); + old_prog = NULL; + goto err_out; + } + } + + err = dev_xdp_attach(dev, extack, NULL, new_prog, old_prog, flags); + +err_out: + if (err && new_prog) + bpf_prog_put(new_prog); + if (old_prog) + bpf_prog_put(old_prog); + return err; +} diff --git a/net/bpf/prog_ops.c b/net/bpf/prog_ops.c new file mode 100644 index 000000000000..33f02842e715 --- /dev/null +++ b/net/bpf/prog_ops.c @@ -0,0 +1,911 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include + +BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) +{ + return xdp_get_buff_len(xdp); +} + +static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = { + .func = bpf_xdp_get_buff_len, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; + +BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff) + +const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = { + .func = bpf_xdp_get_buff_len, + .gpl_only = false, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &bpf_xdp_get_buff_len_bpf_ids[0], +}; + +static unsigned long xdp_get_metalen(const struct xdp_buff *xdp) +{ + return xdp_data_meta_unsupported(xdp) ? 0 : + xdp->data - xdp->data_meta; +} + +BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) +{ + void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame); + unsigned long metalen = xdp_get_metalen(xdp); + void *data_start = xdp_frame_end + metalen; + void *data = xdp->data + offset; + + if (unlikely(data < data_start || + data > xdp->data_end - ETH_HLEN)) + return -EINVAL; + + if (metalen) + memmove(xdp->data_meta + offset, + xdp->data_meta, metalen); + xdp->data_meta += offset; + xdp->data = data; + + return 0; +} + +static const struct bpf_func_proto bpf_xdp_adjust_head_proto = { + .func = bpf_xdp_adjust_head, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, +}; + +static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, + void *buf, unsigned long len, bool flush) +{ + unsigned long ptr_len, ptr_off = 0; + skb_frag_t *next_frag, *end_frag; + struct skb_shared_info *sinfo; + void *src, *dst; + u8 *ptr_buf; + + if (likely(xdp->data_end - xdp->data >= off + len)) { + src = flush ? buf : xdp->data + off; + dst = flush ? xdp->data + off : buf; + memcpy(dst, src, len); + return; + } + + sinfo = xdp_get_shared_info_from_buff(xdp); + end_frag = &sinfo->frags[sinfo->nr_frags]; + next_frag = &sinfo->frags[0]; + + ptr_len = xdp->data_end - xdp->data; + ptr_buf = xdp->data; + + while (true) { + if (off < ptr_off + ptr_len) { + unsigned long copy_off = off - ptr_off; + unsigned long copy_len = min(len, ptr_len - copy_off); + + src = flush ? buf : ptr_buf + copy_off; + dst = flush ? ptr_buf + copy_off : buf; + memcpy(dst, src, copy_len); + + off += copy_len; + len -= copy_len; + buf += copy_len; + } + + if (!len || next_frag == end_frag) + break; + + ptr_off += ptr_len; + ptr_buf = skb_frag_address(next_frag); + ptr_len = skb_frag_size(next_frag); + next_frag++; + } +} + +static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); + u32 size = xdp->data_end - xdp->data; + void *addr = xdp->data; + int i; + + if (unlikely(offset > 0xffff || len > 0xffff)) + return ERR_PTR(-EFAULT); + + if (offset + len > xdp_get_buff_len(xdp)) + return ERR_PTR(-EINVAL); + + if (offset < size) /* linear area */ + goto out; + + offset -= size; + for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ + u32 frag_size = skb_frag_size(&sinfo->frags[i]); + + if (offset < frag_size) { + addr = skb_frag_address(&sinfo->frags[i]); + size = frag_size; + break; + } + offset -= frag_size; + } +out: + return offset + len < size ? addr + offset : NULL; +} + +BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + void *ptr; + + ptr = bpf_xdp_pointer(xdp, offset, len); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + if (!ptr) + bpf_xdp_copy_buf(xdp, offset, buf, len, false); + else + memcpy(buf, ptr, len); + + return 0; +} + +static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { + .func = bpf_xdp_load_bytes, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_UNINIT_MEM, + .arg4_type = ARG_CONST_SIZE, +}; + +BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + void *ptr; + + ptr = bpf_xdp_pointer(xdp, offset, len); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + if (!ptr) + bpf_xdp_copy_buf(xdp, offset, buf, len, true); + else + memcpy(ptr, buf, len); + + return 0; +} + +static const struct bpf_func_proto bpf_xdp_store_bytes_proto = { + .func = bpf_xdp_store_bytes, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_UNINIT_MEM, + .arg4_type = ARG_CONST_SIZE, +}; + +static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); + skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1]; + struct xdp_rxq_info *rxq = xdp->rxq; + unsigned int tailroom; + + if (!rxq->frag_size || rxq->frag_size > xdp->frame_sz) + return -EOPNOTSUPP; + + tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag); + if (unlikely(offset > tailroom)) + return -EINVAL; + + memset(skb_frag_address(frag) + skb_frag_size(frag), 0, offset); + skb_frag_size_add(frag, offset); + sinfo->xdp_frags_size += offset; + + return 0; +} + +static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); + int i, n_frags_free = 0, len_free = 0; + + if (unlikely(offset > (int)xdp_get_buff_len(xdp) - ETH_HLEN)) + return -EINVAL; + + for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) { + skb_frag_t *frag = &sinfo->frags[i]; + int shrink = min_t(int, offset, skb_frag_size(frag)); + + len_free += shrink; + offset -= shrink; + + if (skb_frag_size(frag) == shrink) { + struct page *page = skb_frag_page(frag); + + __xdp_return(page_address(page), &xdp->rxq->mem, + false, NULL); + n_frags_free++; + } else { + skb_frag_size_sub(frag, shrink); + break; + } + } + sinfo->nr_frags -= n_frags_free; + sinfo->xdp_frags_size -= len_free; + + if (unlikely(!sinfo->nr_frags)) { + xdp_buff_clear_frags_flag(xdp); + xdp->data_end -= offset; + } + + return 0; +} + +BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset) +{ + void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */ + void *data_end = xdp->data_end + offset; + + if (unlikely(xdp_buff_has_frags(xdp))) { /* non-linear xdp buff */ + if (offset < 0) + return bpf_xdp_frags_shrink_tail(xdp, -offset); + + return bpf_xdp_frags_increase_tail(xdp, offset); + } + + /* Notice that xdp_data_hard_end have reserved some tailroom */ + if (unlikely(data_end > data_hard_end)) + return -EINVAL; + + /* ALL drivers MUST init xdp->frame_sz, chicken check below */ + if (unlikely(xdp->frame_sz > PAGE_SIZE)) { + WARN_ONCE(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz); + return -EINVAL; + } + + if (unlikely(data_end < xdp->data + ETH_HLEN)) + return -EINVAL; + + /* Clear memory area on grow, can contain uninit kernel memory */ + if (offset > 0) + memset(xdp->data_end, 0, offset); + + xdp->data_end = data_end; + + return 0; +} + +static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = { + .func = bpf_xdp_adjust_tail, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, +}; + +BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) +{ + void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame); + void *meta = xdp->data_meta + offset; + unsigned long metalen = xdp->data - meta; + + if (xdp_data_meta_unsupported(xdp)) + return -ENOTSUPP; + if (unlikely(meta < xdp_frame_end || + meta > xdp->data)) + return -EINVAL; + if (unlikely(xdp_metalen_invalid(metalen))) + return -EACCES; + + xdp->data_meta = meta; + + return 0; +} + +static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { + .func = bpf_xdp_adjust_meta, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, +}; + +/* XDP_REDIRECT works by a three-step process, implemented in the functions + * below: + * + * 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target + * of the redirect and store it (along with some other metadata) in a per-CPU + * struct bpf_redirect_info. + * + * 2. When the program returns the XDP_REDIRECT return code, the driver will + * call xdp_do_redirect() which will use the information in struct + * bpf_redirect_info to actually enqueue the frame into a map type-specific + * bulk queue structure. + * + * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(), + * which will flush all the different bulk queues, thus completing the + * redirect. + * + * Pointers to the map entries will be kept around for this whole sequence of + * steps, protected by RCU. However, there is no top-level rcu_read_lock() in + * the core code; instead, the RCU protection relies on everything happening + * inside a single NAPI poll sequence, which means it's between a pair of calls + * to local_bh_disable()/local_bh_enable(). + * + * The map entries are marked as __rcu and the map code makes sure to + * dereference those pointers with rcu_dereference_check() in a way that works + * for both sections that to hold an rcu_read_lock() and sections that are + * called from NAPI without a separate rcu_read_lock(). The code below does not + * use RCU annotations, but relies on those in the map code. + */ +void xdp_do_flush(void) +{ + __dev_flush(); + __cpu_map_flush(); + __xsk_map_flush(); +} +EXPORT_SYMBOL_GPL(xdp_do_flush); + +void bpf_clear_redirect_map(struct bpf_map *map) +{ + struct bpf_redirect_info *ri; + int cpu; + + for_each_possible_cpu(cpu) { + ri = per_cpu_ptr(&bpf_redirect_info, cpu); + /* Avoid polluting remote cacheline due to writes if + * not needed. Once we pass this test, we need the + * cmpxchg() to make sure it hasn't been changed in + * the meantime by remote CPU. + */ + if (unlikely(READ_ONCE(ri->map) == map)) + cmpxchg(&ri->map, map, NULL); + } +} + +DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); +EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key); + +u32 xdp_master_redirect(struct xdp_buff *xdp) +{ + struct net_device *master, *slave; + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); + slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); + if (slave && slave != xdp->rxq->dev) { + /* The target device is different from the receiving device, so + * redirect it to the new device. + * Using XDP_REDIRECT gets the correct behaviour from XDP enabled + * drivers to unmap the packet from their rx ring. + */ + ri->tgt_index = slave->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + return XDP_REDIRECT; + } + return XDP_TX; +} +EXPORT_SYMBOL_GPL(xdp_master_redirect); + +static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri, + struct net_device *dev, + struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + enum bpf_map_type map_type = ri->map_type; + void *fwd = ri->tgt_value; + u32 map_id = ri->map_id; + int err; + + ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ + ri->map_type = BPF_MAP_TYPE_UNSPEC; + + err = __xsk_map_redirect(fwd, xdp); + if (unlikely(err)) + goto err; + + _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); + return 0; +err: + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); + return err; +} + +static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, + struct net_device *dev, + struct xdp_frame *xdpf, + struct bpf_prog *xdp_prog) +{ + enum bpf_map_type map_type = ri->map_type; + void *fwd = ri->tgt_value; + u32 map_id = ri->map_id; + struct bpf_map *map; + int err; + + ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ + ri->map_type = BPF_MAP_TYPE_UNSPEC; + + if (unlikely(!xdpf)) { + err = -EOVERFLOW; + goto err; + } + + switch (map_type) { + case BPF_MAP_TYPE_DEVMAP: + fallthrough; + case BPF_MAP_TYPE_DEVMAP_HASH: + map = READ_ONCE(ri->map); + if (unlikely(map)) { + WRITE_ONCE(ri->map, NULL); + err = dev_map_enqueue_multi(xdpf, dev, map, + ri->flags & BPF_F_EXCLUDE_INGRESS); + } else { + err = dev_map_enqueue(fwd, xdpf, dev); + } + break; + case BPF_MAP_TYPE_CPUMAP: + err = cpu_map_enqueue(fwd, xdpf, dev); + break; + case BPF_MAP_TYPE_UNSPEC: + if (map_id == INT_MAX) { + fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index); + if (unlikely(!fwd)) { + err = -EINVAL; + break; + } + err = dev_xdp_enqueue(fwd, xdpf, dev); + break; + } + fallthrough; + default: + err = -EBADRQC; + } + + if (unlikely(err)) + goto err; + + _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); + return 0; +err: + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); + return err; +} + +int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + enum bpf_map_type map_type = ri->map_type; + + /* XDP_REDIRECT is not fully supported yet for xdp frags since + * not all XDP capable drivers can map non-linear xdp_frame in + * ndo_xdp_xmit. + */ + if (unlikely(xdp_buff_has_frags(xdp) && + map_type != BPF_MAP_TYPE_CPUMAP)) + return -EOPNOTSUPP; + + if (map_type == BPF_MAP_TYPE_XSKMAP) + return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); + + return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), + xdp_prog); +} +EXPORT_SYMBOL_GPL(xdp_do_redirect); + +int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, + struct xdp_frame *xdpf, struct bpf_prog *xdp_prog) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + enum bpf_map_type map_type = ri->map_type; + + if (map_type == BPF_MAP_TYPE_XSKMAP) + return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); + + return __xdp_do_redirect_frame(ri, dev, xdpf, xdp_prog); +} +EXPORT_SYMBOL_GPL(xdp_do_redirect_frame); + +static int xdp_do_generic_redirect_map(struct net_device *dev, + struct sk_buff *skb, + struct xdp_buff *xdp, + struct bpf_prog *xdp_prog, + void *fwd, + enum bpf_map_type map_type, u32 map_id) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_map *map; + int err; + + switch (map_type) { + case BPF_MAP_TYPE_DEVMAP: + fallthrough; + case BPF_MAP_TYPE_DEVMAP_HASH: + map = READ_ONCE(ri->map); + if (unlikely(map)) { + WRITE_ONCE(ri->map, NULL); + err = dev_map_redirect_multi(dev, skb, xdp_prog, map, + ri->flags & BPF_F_EXCLUDE_INGRESS); + } else { + err = dev_map_generic_redirect(fwd, skb, xdp_prog); + } + if (unlikely(err)) + goto err; + break; + case BPF_MAP_TYPE_XSKMAP: + err = xsk_generic_rcv(fwd, xdp); + if (err) + goto err; + consume_skb(skb); + break; + case BPF_MAP_TYPE_CPUMAP: + err = cpu_map_generic_redirect(fwd, skb); + if (unlikely(err)) + goto err; + break; + default: + err = -EBADRQC; + goto err; + } + + _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); + return 0; +err: + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); + return err; +} + +int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, + struct xdp_buff *xdp, struct bpf_prog *xdp_prog) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + enum bpf_map_type map_type = ri->map_type; + void *fwd = ri->tgt_value; + u32 map_id = ri->map_id; + int err; + + ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ + ri->map_type = BPF_MAP_TYPE_UNSPEC; + + if (map_type == BPF_MAP_TYPE_UNSPEC && map_id == INT_MAX) { + fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index); + if (unlikely(!fwd)) { + err = -EINVAL; + goto err; + } + + err = xdp_ok_fwd_dev(fwd, skb->len); + if (unlikely(err)) + goto err; + + skb->dev = fwd; + _trace_xdp_redirect(dev, xdp_prog, ri->tgt_index); + generic_xdp_tx(skb, xdp_prog); + return 0; + } + + return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog, fwd, map_type, map_id); +err: + _trace_xdp_redirect_err(dev, xdp_prog, ri->tgt_index, err); + return err; +} + +BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + if (unlikely(flags)) + return XDP_ABORTED; + + /* NB! Map type UNSPEC and map_id == INT_MAX (never generated + * by map_idr) is used for ifindex based XDP redirect. + */ + ri->tgt_index = ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + + return XDP_REDIRECT; +} + +static const struct bpf_func_proto bpf_xdp_redirect_proto = { + .func = bpf_xdp_redirect, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_ANYTHING, +}; + +BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, + u64, flags) +{ + return map->ops->map_redirect(map, ifindex, flags); +} + +static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { + .func = bpf_xdp_redirect_map, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, +}; + + +static unsigned long bpf_xdp_copy(void *dst, const void *ctx, + unsigned long off, unsigned long len) +{ + struct xdp_buff *xdp = (struct xdp_buff *)ctx; + + bpf_xdp_copy_buf(xdp, off, dst, len, false); + return 0; +} + +BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map, + u64, flags, void *, meta, u64, meta_size) +{ + u64 xdp_size = (flags & BPF_F_CTXLEN_MASK) >> 32; + + if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK))) + return -EINVAL; + + if (unlikely(!xdp || xdp_size > xdp_get_buff_len(xdp))) + return -EFAULT; + + return bpf_event_output(map, flags, meta, meta_size, xdp, + xdp_size, bpf_xdp_copy); +} + +static const struct bpf_func_proto bpf_xdp_event_output_proto = { + .func = bpf_xdp_event_output, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_CONST_MAP_PTR, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, + .arg5_type = ARG_CONST_SIZE_OR_ZERO, +}; + +BTF_ID_LIST_SINGLE(bpf_xdp_output_btf_ids, struct, xdp_buff) + +const struct bpf_func_proto bpf_xdp_output_proto = { + .func = bpf_xdp_event_output, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &bpf_xdp_output_btf_ids[0], + .arg2_type = ARG_CONST_MAP_PTR, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, + .arg5_type = ARG_CONST_SIZE_OR_ZERO, +}; + +#ifdef CONFIG_INET +bool bpf_xdp_sock_is_valid_access(int off, int size, enum bpf_access_type type, + struct bpf_insn_access_aux *info) +{ + if (off < 0 || off >= offsetofend(struct bpf_xdp_sock, queue_id)) + return false; + + if (off % size != 0) + return false; + + switch (off) { + default: + return size == sizeof(__u32); + } +} + +u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, u32 *target_size) +{ + struct bpf_insn *insn = insn_buf; + +#define BPF_XDP_SOCK_GET(FIELD) \ + do { \ + BUILD_BUG_ON(sizeof_field(struct xdp_sock, FIELD) > \ + sizeof_field(struct bpf_xdp_sock, FIELD)); \ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_sock, FIELD),\ + si->dst_reg, si->src_reg, \ + offsetof(struct xdp_sock, FIELD)); \ + } while (0) + + switch (si->off) { + case offsetof(struct bpf_xdp_sock, queue_id): + BPF_XDP_SOCK_GET(queue_id); + break; + } + + return insn - insn_buf; +} +#endif /* CONFIG_INET */ + +static int xdp_noop_prologue(struct bpf_insn *insn_buf, bool direct_write, + const struct bpf_prog *prog) +{ + /* Neither direct read nor direct write requires any preliminary + * action. + */ + return 0; +} + +static bool __is_valid_xdp_access(int off, int size) +{ + if (off < 0 || off >= sizeof(struct xdp_md)) + return false; + if (off % size != 0) + return false; + if (size != sizeof(__u32)) + return false; + + return true; +} + +static bool xdp_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + if (prog->expected_attach_type != BPF_XDP_DEVMAP) { + switch (off) { + case offsetof(struct xdp_md, egress_ifindex): + return false; + } + } + + if (type == BPF_WRITE) { + if (bpf_prog_is_dev_bound(prog->aux)) { + switch (off) { + case offsetof(struct xdp_md, rx_queue_index): + return __is_valid_xdp_access(off, size); + } + } + return false; + } + + switch (off) { + case offsetof(struct xdp_md, data): + info->reg_type = PTR_TO_PACKET; + break; + case offsetof(struct xdp_md, data_meta): + info->reg_type = PTR_TO_PACKET_META; + break; + case offsetof(struct xdp_md, data_end): + info->reg_type = PTR_TO_PACKET_END; + break; + } + + return __is_valid_xdp_access(off, size); +} + +void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act) +{ + const u32 act_max = XDP_REDIRECT; + + pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n", + act > act_max ? "Illegal" : "Driver unsupported", + act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A"); +} +EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); + +static u32 xdp_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, u32 *target_size) +{ + struct bpf_insn *insn = insn_buf; + + switch (si->off) { + case offsetof(struct xdp_md, data): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, data)); + break; + case offsetof(struct xdp_md, data_meta): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_meta), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, data_meta)); + break; + case offsetof(struct xdp_md, data_end): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_end), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, data_end)); + break; + case offsetof(struct xdp_md, ingress_ifindex): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, rxq)); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev), + si->dst_reg, si->dst_reg, + offsetof(struct xdp_rxq_info, dev)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct net_device, ifindex)); + break; + case offsetof(struct xdp_md, rx_queue_index): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, rxq)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct xdp_rxq_info, + queue_index)); + break; + case offsetof(struct xdp_md, egress_ifindex): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, txq)); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev), + si->dst_reg, si->dst_reg, + offsetof(struct xdp_txq_info, dev)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct net_device, ifindex)); + break; + } + + return insn - insn_buf; +} + +bool xdp_helper_changes_pkt_data(const void *func) +{ + return func == bpf_xdp_adjust_head || + func == bpf_xdp_adjust_meta || + func == bpf_xdp_adjust_tail; +} + +static const struct bpf_func_proto * +xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { + case BPF_FUNC_perf_event_output: + return &bpf_xdp_event_output_proto; + case BPF_FUNC_get_smp_processor_id: + return &bpf_get_smp_processor_id_proto; + case BPF_FUNC_xdp_adjust_head: + return &bpf_xdp_adjust_head_proto; + case BPF_FUNC_xdp_adjust_meta: + return &bpf_xdp_adjust_meta_proto; + case BPF_FUNC_redirect: + return &bpf_xdp_redirect_proto; + case BPF_FUNC_redirect_map: + return &bpf_xdp_redirect_map_proto; + case BPF_FUNC_xdp_adjust_tail: + return &bpf_xdp_adjust_tail_proto; + case BPF_FUNC_xdp_get_buff_len: + return &bpf_xdp_get_buff_len_proto; + case BPF_FUNC_xdp_load_bytes: + return &bpf_xdp_load_bytes_proto; + case BPF_FUNC_xdp_store_bytes: + return &bpf_xdp_store_bytes_proto; + default: + return xdp_inet_func_proto(func_id); + } +} + +const struct bpf_verifier_ops xdp_verifier_ops = { + .get_func_proto = xdp_func_proto, + .is_valid_access = xdp_is_valid_access, + .convert_ctx_access = xdp_convert_ctx_access, + .gen_prologue = xdp_noop_prologue, +}; + +const struct bpf_prog_ops xdp_prog_ops = { + .test_run = bpf_prog_test_run_xdp, +}; + +DEFINE_BPF_DISPATCHER(xdp) + +void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog) +{ + bpf_dispatcher_change_prog(BPF_DISPATCHER_PTR(xdp), prev_prog, prog); +} diff --git a/net/core/Makefile b/net/core/Makefile index e8ce3bd283a6..f6eceff1cf36 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -12,7 +12,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ - fib_notifier.o xdp.o flow_offload.o gro.o + fib_notifier.o flow_offload.o gro.o obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o diff --git a/net/core/dev.c b/net/core/dev.c index 8958c4227b67..52b64d24c439 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1593,23 +1593,6 @@ void dev_disable_lro(struct net_device *dev) } EXPORT_SYMBOL(dev_disable_lro); -/** - * dev_disable_gro_hw - disable HW Generic Receive Offload on a device - * @dev: device - * - * Disable HW Generic Receive Offload (GRO_HW) on a net device. Must be - * called under RTNL. This is needed if Generic XDP is installed on - * the device. - */ -static void dev_disable_gro_hw(struct net_device *dev) -{ - dev->wanted_features &= ~NETIF_F_GRO_HW; - netdev_update_features(dev); - - if (unlikely(dev->features & NETIF_F_GRO_HW)) - netdev_WARN(dev, "failed to disable GRO_HW!\n"); -} - const char *netdev_cmd_to_name(enum netdev_cmd cmd) { #define N(val) \ @@ -4696,227 +4679,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu, return NET_RX_DROP; } -static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb) -{ - struct net_device *dev = skb->dev; - struct netdev_rx_queue *rxqueue; - - rxqueue = dev->_rx; - - if (skb_rx_queue_recorded(skb)) { - u16 index = skb_get_rx_queue(skb); - - if (unlikely(index >= dev->real_num_rx_queues)) { - WARN_ONCE(dev->real_num_rx_queues > 1, - "%s received packet on queue %u, but number " - "of RX queues is %u\n", - dev->name, index, dev->real_num_rx_queues); - - return rxqueue; /* Return first rxqueue */ - } - rxqueue += index; - } - return rxqueue; -} - -u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - void *orig_data, *orig_data_end, *hard_start; - struct netdev_rx_queue *rxqueue; - bool orig_bcast, orig_host; - u32 mac_len, frame_sz; - __be16 orig_eth_type; - struct ethhdr *eth; - u32 metalen, act; - int off; - - /* The XDP program wants to see the packet starting at the MAC - * header. - */ - mac_len = skb->data - skb_mac_header(skb); - hard_start = skb->data - skb_headroom(skb); - - /* SKB "head" area always have tailroom for skb_shared_info */ - frame_sz = (void *)skb_end_pointer(skb) - hard_start; - frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - - rxqueue = netif_get_rxqueue(skb); - xdp_init_buff(xdp, frame_sz, &rxqueue->xdp_rxq); - xdp_prepare_buff(xdp, hard_start, skb_headroom(skb) - mac_len, - skb_headlen(skb) + mac_len, true); - - orig_data_end = xdp->data_end; - orig_data = xdp->data; - eth = (struct ethhdr *)xdp->data; - orig_host = ether_addr_equal_64bits(eth->h_dest, skb->dev->dev_addr); - orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest); - orig_eth_type = eth->h_proto; - - act = bpf_prog_run_xdp(xdp_prog, xdp); - - /* check if bpf_xdp_adjust_head was used */ - off = xdp->data - orig_data; - if (off) { - if (off > 0) - __skb_pull(skb, off); - else if (off < 0) - __skb_push(skb, -off); - - skb->mac_header += off; - skb_reset_network_header(skb); - } - - /* check if bpf_xdp_adjust_tail was used */ - off = xdp->data_end - orig_data_end; - if (off != 0) { - skb_set_tail_pointer(skb, xdp->data_end - xdp->data); - skb->len += off; /* positive on grow, negative on shrink */ - } - - /* check if XDP changed eth hdr such SKB needs update */ - eth = (struct ethhdr *)xdp->data; - if ((orig_eth_type != eth->h_proto) || - (orig_host != ether_addr_equal_64bits(eth->h_dest, - skb->dev->dev_addr)) || - (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) { - __skb_push(skb, ETH_HLEN); - skb->pkt_type = PACKET_HOST; - skb->protocol = eth_type_trans(skb, skb->dev); - } - - /* Redirect/Tx gives L2 packet, code that will reuse skb must __skb_pull - * before calling us again on redirect path. We do not call do_redirect - * as we leave that up to the caller. - * - * Caller is responsible for managing lifetime of skb (i.e. calling - * kfree_skb in response to actions it cannot handle/XDP_DROP). - */ - switch (act) { - case XDP_REDIRECT: - case XDP_TX: - __skb_push(skb, mac_len); - break; - case XDP_PASS: - metalen = xdp->data - xdp->data_meta; - if (metalen) - skb_metadata_set(skb, metalen); - break; - } - - return act; -} - -static u32 netif_receive_generic_xdp(struct sk_buff *skb, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - u32 act = XDP_DROP; - - /* Reinjected packets coming from act_mirred or similar should - * not get XDP generic processing. - */ - if (skb_is_redirected(skb)) - return XDP_PASS; - - /* XDP packets must be linear and must have sufficient headroom - * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also - * native XDP provides, thus we need to do it here as well. - */ - if (skb_cloned(skb) || skb_is_nonlinear(skb) || - skb_headroom(skb) < XDP_PACKET_HEADROOM) { - int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); - int troom = skb->tail + skb->data_len - skb->end; - - /* In case we have to go down the path and also linearize, - * then lets do the pskb_expand_head() work just once here. - */ - if (pskb_expand_head(skb, - hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, - troom > 0 ? troom + 128 : 0, GFP_ATOMIC)) - goto do_drop; - if (skb_linearize(skb)) - goto do_drop; - } - - act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog); - switch (act) { - case XDP_REDIRECT: - case XDP_TX: - case XDP_PASS: - break; - default: - bpf_warn_invalid_xdp_action(skb->dev, xdp_prog, act); - fallthrough; - case XDP_ABORTED: - trace_xdp_exception(skb->dev, xdp_prog, act); - fallthrough; - case XDP_DROP: - do_drop: - kfree_skb(skb); - break; - } - - return act; -} - -/* When doing generic XDP we have to bypass the qdisc layer and the - * network taps in order to match in-driver-XDP behavior. - */ -void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) -{ - struct net_device *dev = skb->dev; - struct netdev_queue *txq; - bool free_skb = true; - int cpu, rc; - - txq = netdev_core_pick_tx(dev, skb, NULL); - cpu = smp_processor_id(); - HARD_TX_LOCK(dev, txq, cpu); - if (!netif_xmit_stopped(txq)) { - rc = netdev_start_xmit(skb, dev, txq, 0); - if (dev_xmit_complete(rc)) - free_skb = false; - } - HARD_TX_UNLOCK(dev, txq); - if (free_skb) { - trace_xdp_exception(dev, xdp_prog, XDP_TX); - kfree_skb(skb); - } -} - -static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); - -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) -{ - if (xdp_prog) { - struct xdp_buff xdp; - u32 act; - int err; - - act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); - if (act != XDP_PASS) { - switch (act) { - case XDP_REDIRECT: - err = xdp_do_generic_redirect(skb->dev, skb, - &xdp, xdp_prog); - if (err) - goto out_redir; - break; - case XDP_TX: - generic_xdp_tx(skb, xdp_prog); - break; - } - return XDP_DROP; - } - } - return XDP_PASS; -out_redir: - kfree_skb_reason(skb, SKB_DROP_REASON_XDP); - return XDP_DROP; -} -EXPORT_SYMBOL_GPL(do_xdp_generic); - static int netif_rx_internal(struct sk_buff *skb) { int ret; @@ -5624,35 +5386,6 @@ static void __netif_receive_skb_list(struct list_head *head) memalloc_noreclaim_restore(noreclaim_flag); } -static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) -{ - struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); - struct bpf_prog *new = xdp->prog; - int ret = 0; - - switch (xdp->command) { - case XDP_SETUP_PROG: - rcu_assign_pointer(dev->xdp_prog, new); - if (old) - bpf_prog_put(old); - - if (old && !new) { - static_branch_dec(&generic_xdp_needed_key); - } else if (new && !old) { - static_branch_inc(&generic_xdp_needed_key); - dev_disable_lro(dev); - dev_disable_gro_hw(dev); - } - break; - - default: - ret = -EINVAL; - break; - } - - return ret; -} - static int netif_receive_skb_internal(struct sk_buff *skb) { int ret; @@ -9016,510 +8749,6 @@ void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask, } } -struct bpf_xdp_link { - struct bpf_link link; - struct net_device *dev; /* protected by rtnl_lock, no refcnt held */ - int flags; -}; - -static enum bpf_xdp_mode dev_xdp_mode(struct net_device *dev, u32 flags) -{ - if (flags & XDP_FLAGS_HW_MODE) - return XDP_MODE_HW; - if (flags & XDP_FLAGS_DRV_MODE) - return XDP_MODE_DRV; - if (flags & XDP_FLAGS_SKB_MODE) - return XDP_MODE_SKB; - return dev->netdev_ops->ndo_bpf ? XDP_MODE_DRV : XDP_MODE_SKB; -} - -static bpf_op_t dev_xdp_bpf_op(struct net_device *dev, enum bpf_xdp_mode mode) -{ - switch (mode) { - case XDP_MODE_SKB: - return generic_xdp_install; - case XDP_MODE_DRV: - case XDP_MODE_HW: - return dev->netdev_ops->ndo_bpf; - default: - return NULL; - } -} - -static struct bpf_xdp_link *dev_xdp_link(struct net_device *dev, - enum bpf_xdp_mode mode) -{ - return dev->xdp_state[mode].link; -} - -static struct bpf_prog *dev_xdp_prog(struct net_device *dev, - enum bpf_xdp_mode mode) -{ - struct bpf_xdp_link *link = dev_xdp_link(dev, mode); - - if (link) - return link->link.prog; - return dev->xdp_state[mode].prog; -} - -u8 dev_xdp_prog_count(struct net_device *dev) -{ - u8 count = 0; - int i; - - for (i = 0; i < __MAX_XDP_MODE; i++) - if (dev->xdp_state[i].prog || dev->xdp_state[i].link) - count++; - return count; -} -EXPORT_SYMBOL_GPL(dev_xdp_prog_count); - -u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode) -{ - struct bpf_prog *prog = dev_xdp_prog(dev, mode); - - return prog ? prog->aux->id : 0; -} - -static void dev_xdp_set_link(struct net_device *dev, enum bpf_xdp_mode mode, - struct bpf_xdp_link *link) -{ - dev->xdp_state[mode].link = link; - dev->xdp_state[mode].prog = NULL; -} - -static void dev_xdp_set_prog(struct net_device *dev, enum bpf_xdp_mode mode, - struct bpf_prog *prog) -{ - dev->xdp_state[mode].link = NULL; - dev->xdp_state[mode].prog = prog; -} - -static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, - bpf_op_t bpf_op, struct netlink_ext_ack *extack, - u32 flags, struct bpf_prog *prog) -{ - struct netdev_bpf xdp; - int err; - - memset(&xdp, 0, sizeof(xdp)); - xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW : XDP_SETUP_PROG; - xdp.extack = extack; - xdp.flags = flags; - xdp.prog = prog; - - /* Drivers assume refcnt is already incremented (i.e, prog pointer is - * "moved" into driver), so they don't increment it on their own, but - * they do decrement refcnt when program is detached or replaced. - * Given net_device also owns link/prog, we need to bump refcnt here - * to prevent drivers from underflowing it. - */ - if (prog) - bpf_prog_inc(prog); - err = bpf_op(dev, &xdp); - if (err) { - if (prog) - bpf_prog_put(prog); - return err; - } - - if (mode != XDP_MODE_HW) - bpf_prog_change_xdp(dev_xdp_prog(dev, mode), prog); - - return 0; -} - -static void dev_xdp_uninstall(struct net_device *dev) -{ - struct bpf_xdp_link *link; - struct bpf_prog *prog; - enum bpf_xdp_mode mode; - bpf_op_t bpf_op; - - ASSERT_RTNL(); - - for (mode = XDP_MODE_SKB; mode < __MAX_XDP_MODE; mode++) { - prog = dev_xdp_prog(dev, mode); - if (!prog) - continue; - - bpf_op = dev_xdp_bpf_op(dev, mode); - if (!bpf_op) - continue; - - WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); - - /* auto-detach link from net device */ - link = dev_xdp_link(dev, mode); - if (link) - link->dev = NULL; - else - bpf_prog_put(prog); - - dev_xdp_set_link(dev, mode, NULL); - } -} - -static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack, - struct bpf_xdp_link *link, struct bpf_prog *new_prog, - struct bpf_prog *old_prog, u32 flags) -{ - unsigned int num_modes = hweight32(flags & XDP_FLAGS_MODES); - struct bpf_prog *cur_prog; - struct net_device *upper; - struct list_head *iter; - enum bpf_xdp_mode mode; - bpf_op_t bpf_op; - int err; - - ASSERT_RTNL(); - - /* either link or prog attachment, never both */ - if (link && (new_prog || old_prog)) - return -EINVAL; - /* link supports only XDP mode flags */ - if (link && (flags & ~XDP_FLAGS_MODES)) { - NL_SET_ERR_MSG(extack, "Invalid XDP flags for BPF link attachment"); - return -EINVAL; - } - /* just one XDP mode bit should be set, zero defaults to drv/skb mode */ - if (num_modes > 1) { - NL_SET_ERR_MSG(extack, "Only one XDP mode flag can be set"); - return -EINVAL; - } - /* avoid ambiguity if offload + drv/skb mode progs are both loaded */ - if (!num_modes && dev_xdp_prog_count(dev) > 1) { - NL_SET_ERR_MSG(extack, - "More than one program loaded, unset mode is ambiguous"); - return -EINVAL; - } - /* old_prog != NULL implies XDP_FLAGS_REPLACE is set */ - if (old_prog && !(flags & XDP_FLAGS_REPLACE)) { - NL_SET_ERR_MSG(extack, "XDP_FLAGS_REPLACE is not specified"); - return -EINVAL; - } - - mode = dev_xdp_mode(dev, flags); - /* can't replace attached link */ - if (dev_xdp_link(dev, mode)) { - NL_SET_ERR_MSG(extack, "Can't replace active BPF XDP link"); - return -EBUSY; - } - - /* don't allow if an upper device already has a program */ - netdev_for_each_upper_dev_rcu(dev, upper, iter) { - if (dev_xdp_prog_count(upper) > 0) { - NL_SET_ERR_MSG(extack, "Cannot attach when an upper device already has a program"); - return -EEXIST; - } - } - - cur_prog = dev_xdp_prog(dev, mode); - /* can't replace attached prog with link */ - if (link && cur_prog) { - NL_SET_ERR_MSG(extack, "Can't replace active XDP program with BPF link"); - return -EBUSY; - } - if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) { - NL_SET_ERR_MSG(extack, "Active program does not match expected"); - return -EEXIST; - } - - /* put effective new program into new_prog */ - if (link) - new_prog = link->link.prog; - - if (new_prog) { - bool offload = mode == XDP_MODE_HW; - enum bpf_xdp_mode other_mode = mode == XDP_MODE_SKB - ? XDP_MODE_DRV : XDP_MODE_SKB; - - if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && cur_prog) { - NL_SET_ERR_MSG(extack, "XDP program already attached"); - return -EBUSY; - } - if (!offload && dev_xdp_prog(dev, other_mode)) { - NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time"); - return -EEXIST; - } - if (!offload && bpf_prog_is_dev_bound(new_prog->aux)) { - NL_SET_ERR_MSG(extack, "Using device-bound program without HW_MODE flag is not supported"); - return -EINVAL; - } - if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) { - NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device"); - return -EINVAL; - } - if (new_prog->expected_attach_type == BPF_XDP_CPUMAP) { - NL_SET_ERR_MSG(extack, "BPF_XDP_CPUMAP programs can not be attached to a device"); - return -EINVAL; - } - } - - /* don't call drivers if the effective program didn't change */ - if (new_prog != cur_prog) { - bpf_op = dev_xdp_bpf_op(dev, mode); - if (!bpf_op) { - NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode"); - return -EOPNOTSUPP; - } - - err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog); - if (err) - return err; - } - - if (link) - dev_xdp_set_link(dev, mode, link); - else - dev_xdp_set_prog(dev, mode, new_prog); - if (cur_prog) - bpf_prog_put(cur_prog); - - return 0; -} - -static int dev_xdp_attach_link(struct net_device *dev, - struct netlink_ext_ack *extack, - struct bpf_xdp_link *link) -{ - return dev_xdp_attach(dev, extack, link, NULL, NULL, link->flags); -} - -static int dev_xdp_detach_link(struct net_device *dev, - struct netlink_ext_ack *extack, - struct bpf_xdp_link *link) -{ - enum bpf_xdp_mode mode; - bpf_op_t bpf_op; - - ASSERT_RTNL(); - - mode = dev_xdp_mode(dev, link->flags); - if (dev_xdp_link(dev, mode) != link) - return -EINVAL; - - bpf_op = dev_xdp_bpf_op(dev, mode); - WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); - dev_xdp_set_link(dev, mode, NULL); - return 0; -} - -static void bpf_xdp_link_release(struct bpf_link *link) -{ - struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - - rtnl_lock(); - - /* if racing with net_device's tear down, xdp_link->dev might be - * already NULL, in which case link was already auto-detached - */ - if (xdp_link->dev) { - WARN_ON(dev_xdp_detach_link(xdp_link->dev, NULL, xdp_link)); - xdp_link->dev = NULL; - } - - rtnl_unlock(); -} - -static int bpf_xdp_link_detach(struct bpf_link *link) -{ - bpf_xdp_link_release(link); - return 0; -} - -static void bpf_xdp_link_dealloc(struct bpf_link *link) -{ - struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - - kfree(xdp_link); -} - -static void bpf_xdp_link_show_fdinfo(const struct bpf_link *link, - struct seq_file *seq) -{ - struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - u32 ifindex = 0; - - rtnl_lock(); - if (xdp_link->dev) - ifindex = xdp_link->dev->ifindex; - rtnl_unlock(); - - seq_printf(seq, "ifindex:\t%u\n", ifindex); -} - -static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, - struct bpf_link_info *info) -{ - struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - u32 ifindex = 0; - - rtnl_lock(); - if (xdp_link->dev) - ifindex = xdp_link->dev->ifindex; - rtnl_unlock(); - - info->xdp.ifindex = ifindex; - return 0; -} - -static int bpf_xdp_link_update(struct bpf_link *link, struct bpf_prog *new_prog, - struct bpf_prog *old_prog) -{ - struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - enum bpf_xdp_mode mode; - bpf_op_t bpf_op; - int err = 0; - - rtnl_lock(); - - /* link might have been auto-released already, so fail */ - if (!xdp_link->dev) { - err = -ENOLINK; - goto out_unlock; - } - - if (old_prog && link->prog != old_prog) { - err = -EPERM; - goto out_unlock; - } - old_prog = link->prog; - if (old_prog->type != new_prog->type || - old_prog->expected_attach_type != new_prog->expected_attach_type) { - err = -EINVAL; - goto out_unlock; - } - - if (old_prog == new_prog) { - /* no-op, don't disturb drivers */ - bpf_prog_put(new_prog); - goto out_unlock; - } - - mode = dev_xdp_mode(xdp_link->dev, xdp_link->flags); - bpf_op = dev_xdp_bpf_op(xdp_link->dev, mode); - err = dev_xdp_install(xdp_link->dev, mode, bpf_op, NULL, - xdp_link->flags, new_prog); - if (err) - goto out_unlock; - - old_prog = xchg(&link->prog, new_prog); - bpf_prog_put(old_prog); - -out_unlock: - rtnl_unlock(); - return err; -} - -static const struct bpf_link_ops bpf_xdp_link_lops = { - .release = bpf_xdp_link_release, - .dealloc = bpf_xdp_link_dealloc, - .detach = bpf_xdp_link_detach, - .show_fdinfo = bpf_xdp_link_show_fdinfo, - .fill_link_info = bpf_xdp_link_fill_link_info, - .update_prog = bpf_xdp_link_update, -}; - -int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) -{ - struct net *net = current->nsproxy->net_ns; - struct bpf_link_primer link_primer; - struct bpf_xdp_link *link; - struct net_device *dev; - int err, fd; - - rtnl_lock(); - dev = dev_get_by_index(net, attr->link_create.target_ifindex); - if (!dev) { - rtnl_unlock(); - return -EINVAL; - } - - link = kzalloc(sizeof(*link), GFP_USER); - if (!link) { - err = -ENOMEM; - goto unlock; - } - - bpf_link_init(&link->link, BPF_LINK_TYPE_XDP, &bpf_xdp_link_lops, prog); - link->dev = dev; - link->flags = attr->link_create.flags; - - err = bpf_link_prime(&link->link, &link_primer); - if (err) { - kfree(link); - goto unlock; - } - - err = dev_xdp_attach_link(dev, NULL, link); - rtnl_unlock(); - - if (err) { - link->dev = NULL; - bpf_link_cleanup(&link_primer); - goto out_put_dev; - } - - fd = bpf_link_settle(&link_primer); - /* link itself doesn't hold dev's refcnt to not complicate shutdown */ - dev_put(dev); - return fd; - -unlock: - rtnl_unlock(); - -out_put_dev: - dev_put(dev); - return err; -} - -/** - * dev_change_xdp_fd - set or clear a bpf program for a device rx path - * @dev: device - * @extack: netlink extended ack - * @fd: new program fd or negative value to clear - * @expected_fd: old program fd that userspace expects to replace or clear - * @flags: xdp-related flags - * - * Set or clear a bpf program for a device - */ -int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, - int fd, int expected_fd, u32 flags) -{ - enum bpf_xdp_mode mode = dev_xdp_mode(dev, flags); - struct bpf_prog *new_prog = NULL, *old_prog = NULL; - int err; - - ASSERT_RTNL(); - - if (fd >= 0) { - new_prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, - mode != XDP_MODE_SKB); - if (IS_ERR(new_prog)) - return PTR_ERR(new_prog); - } - - if (expected_fd >= 0) { - old_prog = bpf_prog_get_type_dev(expected_fd, BPF_PROG_TYPE_XDP, - mode != XDP_MODE_SKB); - if (IS_ERR(old_prog)) { - err = PTR_ERR(old_prog); - old_prog = NULL; - goto err_out; - } - } - - err = dev_xdp_attach(dev, extack, NULL, new_prog, old_prog, flags); - -err_out: - if (err && new_prog) - bpf_prog_put(new_prog); - if (old_prog) - bpf_prog_put(old_prog); - return err; -} - /** * dev_new_index - allocate an ifindex * @net: the applicable net namespace diff --git a/net/core/dev.h b/net/core/dev.h index cbb8a925175a..36a68992f17b 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -78,10 +78,6 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down); void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask, u32 value); -typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); -int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, - int fd, int expected_fd, u32 flags); - int dev_change_tx_queue_len(struct net_device *dev, unsigned long new_len); void dev_set_group(struct net_device *dev, int new_group); int dev_change_carrier(struct net_device *dev, bool new_carrier); diff --git a/net/core/filter.c b/net/core/filter.c index 151aa4756bd6..3933465eb972 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3788,641 +3788,6 @@ static const struct bpf_func_proto sk_skb_change_head_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) -{ - return xdp_get_buff_len(xdp); -} - -static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = { - .func = bpf_xdp_get_buff_len, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, -}; - -BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff) - -const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = { - .func = bpf_xdp_get_buff_len, - .gpl_only = false, - .arg1_type = ARG_PTR_TO_BTF_ID, - .arg1_btf_id = &bpf_xdp_get_buff_len_bpf_ids[0], -}; - -static unsigned long xdp_get_metalen(const struct xdp_buff *xdp) -{ - return xdp_data_meta_unsupported(xdp) ? 0 : - xdp->data - xdp->data_meta; -} - -BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) -{ - void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame); - unsigned long metalen = xdp_get_metalen(xdp); - void *data_start = xdp_frame_end + metalen; - void *data = xdp->data + offset; - - if (unlikely(data < data_start || - data > xdp->data_end - ETH_HLEN)) - return -EINVAL; - - if (metalen) - memmove(xdp->data_meta + offset, - xdp->data_meta, metalen); - xdp->data_meta += offset; - xdp->data = data; - - return 0; -} - -static const struct bpf_func_proto bpf_xdp_adjust_head_proto = { - .func = bpf_xdp_adjust_head, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_ANYTHING, -}; - -static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, - void *buf, unsigned long len, bool flush) -{ - unsigned long ptr_len, ptr_off = 0; - skb_frag_t *next_frag, *end_frag; - struct skb_shared_info *sinfo; - void *src, *dst; - u8 *ptr_buf; - - if (likely(xdp->data_end - xdp->data >= off + len)) { - src = flush ? buf : xdp->data + off; - dst = flush ? xdp->data + off : buf; - memcpy(dst, src, len); - return; - } - - sinfo = xdp_get_shared_info_from_buff(xdp); - end_frag = &sinfo->frags[sinfo->nr_frags]; - next_frag = &sinfo->frags[0]; - - ptr_len = xdp->data_end - xdp->data; - ptr_buf = xdp->data; - - while (true) { - if (off < ptr_off + ptr_len) { - unsigned long copy_off = off - ptr_off; - unsigned long copy_len = min(len, ptr_len - copy_off); - - src = flush ? buf : ptr_buf + copy_off; - dst = flush ? ptr_buf + copy_off : buf; - memcpy(dst, src, copy_len); - - off += copy_len; - len -= copy_len; - buf += copy_len; - } - - if (!len || next_frag == end_frag) - break; - - ptr_off += ptr_len; - ptr_buf = skb_frag_address(next_frag); - ptr_len = skb_frag_size(next_frag); - next_frag++; - } -} - -static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) -{ - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - u32 size = xdp->data_end - xdp->data; - void *addr = xdp->data; - int i; - - if (unlikely(offset > 0xffff || len > 0xffff)) - return ERR_PTR(-EFAULT); - - if (offset + len > xdp_get_buff_len(xdp)) - return ERR_PTR(-EINVAL); - - if (offset < size) /* linear area */ - goto out; - - offset -= size; - for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ - u32 frag_size = skb_frag_size(&sinfo->frags[i]); - - if (offset < frag_size) { - addr = skb_frag_address(&sinfo->frags[i]); - size = frag_size; - break; - } - offset -= frag_size; - } -out: - return offset + len < size ? addr + offset : NULL; -} - -BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) -{ - void *ptr; - - ptr = bpf_xdp_pointer(xdp, offset, len); - if (IS_ERR(ptr)) - return PTR_ERR(ptr); - - if (!ptr) - bpf_xdp_copy_buf(xdp, offset, buf, len, false); - else - memcpy(buf, ptr, len); - - return 0; -} - -static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { - .func = bpf_xdp_load_bytes, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_PTR_TO_UNINIT_MEM, - .arg4_type = ARG_CONST_SIZE, -}; - -BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) -{ - void *ptr; - - ptr = bpf_xdp_pointer(xdp, offset, len); - if (IS_ERR(ptr)) - return PTR_ERR(ptr); - - if (!ptr) - bpf_xdp_copy_buf(xdp, offset, buf, len, true); - else - memcpy(ptr, buf, len); - - return 0; -} - -static const struct bpf_func_proto bpf_xdp_store_bytes_proto = { - .func = bpf_xdp_store_bytes, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_PTR_TO_UNINIT_MEM, - .arg4_type = ARG_CONST_SIZE, -}; - -static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) -{ - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1]; - struct xdp_rxq_info *rxq = xdp->rxq; - unsigned int tailroom; - - if (!rxq->frag_size || rxq->frag_size > xdp->frame_sz) - return -EOPNOTSUPP; - - tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag); - if (unlikely(offset > tailroom)) - return -EINVAL; - - memset(skb_frag_address(frag) + skb_frag_size(frag), 0, offset); - skb_frag_size_add(frag, offset); - sinfo->xdp_frags_size += offset; - - return 0; -} - -static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset) -{ - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - int i, n_frags_free = 0, len_free = 0; - - if (unlikely(offset > (int)xdp_get_buff_len(xdp) - ETH_HLEN)) - return -EINVAL; - - for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) { - skb_frag_t *frag = &sinfo->frags[i]; - int shrink = min_t(int, offset, skb_frag_size(frag)); - - len_free += shrink; - offset -= shrink; - - if (skb_frag_size(frag) == shrink) { - struct page *page = skb_frag_page(frag); - - __xdp_return(page_address(page), &xdp->rxq->mem, - false, NULL); - n_frags_free++; - } else { - skb_frag_size_sub(frag, shrink); - break; - } - } - sinfo->nr_frags -= n_frags_free; - sinfo->xdp_frags_size -= len_free; - - if (unlikely(!sinfo->nr_frags)) { - xdp_buff_clear_frags_flag(xdp); - xdp->data_end -= offset; - } - - return 0; -} - -BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset) -{ - void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */ - void *data_end = xdp->data_end + offset; - - if (unlikely(xdp_buff_has_frags(xdp))) { /* non-linear xdp buff */ - if (offset < 0) - return bpf_xdp_frags_shrink_tail(xdp, -offset); - - return bpf_xdp_frags_increase_tail(xdp, offset); - } - - /* Notice that xdp_data_hard_end have reserved some tailroom */ - if (unlikely(data_end > data_hard_end)) - return -EINVAL; - - /* ALL drivers MUST init xdp->frame_sz, chicken check below */ - if (unlikely(xdp->frame_sz > PAGE_SIZE)) { - WARN_ONCE(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz); - return -EINVAL; - } - - if (unlikely(data_end < xdp->data + ETH_HLEN)) - return -EINVAL; - - /* Clear memory area on grow, can contain uninit kernel memory */ - if (offset > 0) - memset(xdp->data_end, 0, offset); - - xdp->data_end = data_end; - - return 0; -} - -static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = { - .func = bpf_xdp_adjust_tail, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_ANYTHING, -}; - -BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) -{ - void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame); - void *meta = xdp->data_meta + offset; - unsigned long metalen = xdp->data - meta; - - if (xdp_data_meta_unsupported(xdp)) - return -ENOTSUPP; - if (unlikely(meta < xdp_frame_end || - meta > xdp->data)) - return -EINVAL; - if (unlikely(xdp_metalen_invalid(metalen))) - return -EACCES; - - xdp->data_meta = meta; - - return 0; -} - -static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { - .func = bpf_xdp_adjust_meta, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_ANYTHING, -}; - -/* XDP_REDIRECT works by a three-step process, implemented in the functions - * below: - * - * 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target - * of the redirect and store it (along with some other metadata) in a per-CPU - * struct bpf_redirect_info. - * - * 2. When the program returns the XDP_REDIRECT return code, the driver will - * call xdp_do_redirect() which will use the information in struct - * bpf_redirect_info to actually enqueue the frame into a map type-specific - * bulk queue structure. - * - * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(), - * which will flush all the different bulk queues, thus completing the - * redirect. - * - * Pointers to the map entries will be kept around for this whole sequence of - * steps, protected by RCU. However, there is no top-level rcu_read_lock() in - * the core code; instead, the RCU protection relies on everything happening - * inside a single NAPI poll sequence, which means it's between a pair of calls - * to local_bh_disable()/local_bh_enable(). - * - * The map entries are marked as __rcu and the map code makes sure to - * dereference those pointers with rcu_dereference_check() in a way that works - * for both sections that to hold an rcu_read_lock() and sections that are - * called from NAPI without a separate rcu_read_lock(). The code below does not - * use RCU annotations, but relies on those in the map code. - */ -void xdp_do_flush(void) -{ - __dev_flush(); - __cpu_map_flush(); - __xsk_map_flush(); -} -EXPORT_SYMBOL_GPL(xdp_do_flush); - -void bpf_clear_redirect_map(struct bpf_map *map) -{ - struct bpf_redirect_info *ri; - int cpu; - - for_each_possible_cpu(cpu) { - ri = per_cpu_ptr(&bpf_redirect_info, cpu); - /* Avoid polluting remote cacheline due to writes if - * not needed. Once we pass this test, we need the - * cmpxchg() to make sure it hasn't been changed in - * the meantime by remote CPU. - */ - if (unlikely(READ_ONCE(ri->map) == map)) - cmpxchg(&ri->map, map, NULL); - } -} - -DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); -EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key); - -u32 xdp_master_redirect(struct xdp_buff *xdp) -{ - struct net_device *master, *slave; - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - - master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); - slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); - if (slave && slave != xdp->rxq->dev) { - /* The target device is different from the receiving device, so - * redirect it to the new device. - * Using XDP_REDIRECT gets the correct behaviour from XDP enabled - * drivers to unmap the packet from their rx ring. - */ - ri->tgt_index = slave->ifindex; - ri->map_id = INT_MAX; - ri->map_type = BPF_MAP_TYPE_UNSPEC; - return XDP_REDIRECT; - } - return XDP_TX; -} -EXPORT_SYMBOL_GPL(xdp_master_redirect); - -static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri, - struct net_device *dev, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - enum bpf_map_type map_type = ri->map_type; - void *fwd = ri->tgt_value; - u32 map_id = ri->map_id; - int err; - - ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ - ri->map_type = BPF_MAP_TYPE_UNSPEC; - - err = __xsk_map_redirect(fwd, xdp); - if (unlikely(err)) - goto err; - - _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); - return 0; -err: - _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); - return err; -} - -static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, - struct net_device *dev, - struct xdp_frame *xdpf, - struct bpf_prog *xdp_prog) -{ - enum bpf_map_type map_type = ri->map_type; - void *fwd = ri->tgt_value; - u32 map_id = ri->map_id; - struct bpf_map *map; - int err; - - ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ - ri->map_type = BPF_MAP_TYPE_UNSPEC; - - if (unlikely(!xdpf)) { - err = -EOVERFLOW; - goto err; - } - - switch (map_type) { - case BPF_MAP_TYPE_DEVMAP: - fallthrough; - case BPF_MAP_TYPE_DEVMAP_HASH: - map = READ_ONCE(ri->map); - if (unlikely(map)) { - WRITE_ONCE(ri->map, NULL); - err = dev_map_enqueue_multi(xdpf, dev, map, - ri->flags & BPF_F_EXCLUDE_INGRESS); - } else { - err = dev_map_enqueue(fwd, xdpf, dev); - } - break; - case BPF_MAP_TYPE_CPUMAP: - err = cpu_map_enqueue(fwd, xdpf, dev); - break; - case BPF_MAP_TYPE_UNSPEC: - if (map_id == INT_MAX) { - fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index); - if (unlikely(!fwd)) { - err = -EINVAL; - break; - } - err = dev_xdp_enqueue(fwd, xdpf, dev); - break; - } - fallthrough; - default: - err = -EBADRQC; - } - - if (unlikely(err)) - goto err; - - _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); - return 0; -err: - _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); - return err; -} - -int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - enum bpf_map_type map_type = ri->map_type; - - /* XDP_REDIRECT is not fully supported yet for xdp frags since - * not all XDP capable drivers can map non-linear xdp_frame in - * ndo_xdp_xmit. - */ - if (unlikely(xdp_buff_has_frags(xdp) && - map_type != BPF_MAP_TYPE_CPUMAP)) - return -EOPNOTSUPP; - - if (map_type == BPF_MAP_TYPE_XSKMAP) - return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); - - return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), - xdp_prog); -} -EXPORT_SYMBOL_GPL(xdp_do_redirect); - -int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, - struct xdp_frame *xdpf, struct bpf_prog *xdp_prog) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - enum bpf_map_type map_type = ri->map_type; - - if (map_type == BPF_MAP_TYPE_XSKMAP) - return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); - - return __xdp_do_redirect_frame(ri, dev, xdpf, xdp_prog); -} -EXPORT_SYMBOL_GPL(xdp_do_redirect_frame); - -static int xdp_do_generic_redirect_map(struct net_device *dev, - struct sk_buff *skb, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog, - void *fwd, - enum bpf_map_type map_type, u32 map_id) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - struct bpf_map *map; - int err; - - switch (map_type) { - case BPF_MAP_TYPE_DEVMAP: - fallthrough; - case BPF_MAP_TYPE_DEVMAP_HASH: - map = READ_ONCE(ri->map); - if (unlikely(map)) { - WRITE_ONCE(ri->map, NULL); - err = dev_map_redirect_multi(dev, skb, xdp_prog, map, - ri->flags & BPF_F_EXCLUDE_INGRESS); - } else { - err = dev_map_generic_redirect(fwd, skb, xdp_prog); - } - if (unlikely(err)) - goto err; - break; - case BPF_MAP_TYPE_XSKMAP: - err = xsk_generic_rcv(fwd, xdp); - if (err) - goto err; - consume_skb(skb); - break; - case BPF_MAP_TYPE_CPUMAP: - err = cpu_map_generic_redirect(fwd, skb); - if (unlikely(err)) - goto err; - break; - default: - err = -EBADRQC; - goto err; - } - - _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index); - return 0; -err: - _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err); - return err; -} - -int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, - struct xdp_buff *xdp, struct bpf_prog *xdp_prog) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - enum bpf_map_type map_type = ri->map_type; - void *fwd = ri->tgt_value; - u32 map_id = ri->map_id; - int err; - - ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */ - ri->map_type = BPF_MAP_TYPE_UNSPEC; - - if (map_type == BPF_MAP_TYPE_UNSPEC && map_id == INT_MAX) { - fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index); - if (unlikely(!fwd)) { - err = -EINVAL; - goto err; - } - - err = xdp_ok_fwd_dev(fwd, skb->len); - if (unlikely(err)) - goto err; - - skb->dev = fwd; - _trace_xdp_redirect(dev, xdp_prog, ri->tgt_index); - generic_xdp_tx(skb, xdp_prog); - return 0; - } - - return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog, fwd, map_type, map_id); -err: - _trace_xdp_redirect_err(dev, xdp_prog, ri->tgt_index, err); - return err; -} - -BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - - if (unlikely(flags)) - return XDP_ABORTED; - - /* NB! Map type UNSPEC and map_id == INT_MAX (never generated - * by map_idr) is used for ifindex based XDP redirect. - */ - ri->tgt_index = ifindex; - ri->map_id = INT_MAX; - ri->map_type = BPF_MAP_TYPE_UNSPEC; - - return XDP_REDIRECT; -} - -static const struct bpf_func_proto bpf_xdp_redirect_proto = { - .func = bpf_xdp_redirect, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_ANYTHING, - .arg2_type = ARG_ANYTHING, -}; - -BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, - u64, flags) -{ - return map->ops->map_redirect(map, ifindex, flags); -} - -static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { - .func = bpf_xdp_redirect_map, - .gpl_only = false, - .ret_type = RET_INTEGER, - .arg1_type = ARG_CONST_MAP_PTR, - .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_ANYTHING, -}; - static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, unsigned long off, unsigned long len) { @@ -4830,55 +4195,6 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = { }; #endif -static unsigned long bpf_xdp_copy(void *dst, const void *ctx, - unsigned long off, unsigned long len) -{ - struct xdp_buff *xdp = (struct xdp_buff *)ctx; - - bpf_xdp_copy_buf(xdp, off, dst, len, false); - return 0; -} - -BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map, - u64, flags, void *, meta, u64, meta_size) -{ - u64 xdp_size = (flags & BPF_F_CTXLEN_MASK) >> 32; - - if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK))) - return -EINVAL; - - if (unlikely(!xdp || xdp_size > xdp_get_buff_len(xdp))) - return -EFAULT; - - return bpf_event_output(map, flags, meta, meta_size, xdp, - xdp_size, bpf_xdp_copy); -} - -static const struct bpf_func_proto bpf_xdp_event_output_proto = { - .func = bpf_xdp_event_output, - .gpl_only = true, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_CONST_MAP_PTR, - .arg3_type = ARG_ANYTHING, - .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, - .arg5_type = ARG_CONST_SIZE_OR_ZERO, -}; - -BTF_ID_LIST_SINGLE(bpf_xdp_output_btf_ids, struct, xdp_buff) - -const struct bpf_func_proto bpf_xdp_output_proto = { - .func = bpf_xdp_event_output, - .gpl_only = true, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_BTF_ID, - .arg1_btf_id = &bpf_xdp_output_btf_ids[0], - .arg2_type = ARG_CONST_MAP_PTR, - .arg3_type = ARG_ANYTHING, - .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, - .arg5_type = ARG_CONST_SIZE_OR_ZERO, -}; - BPF_CALL_1(bpf_get_socket_cookie, struct sk_buff *, skb) { return skb->sk ? __sock_gen_cookie(skb->sk) : 0; @@ -6957,46 +6273,6 @@ BPF_CALL_1(bpf_skb_ecn_set_ce, struct sk_buff *, skb) return INET_ECN_set_ce(skb); } -bool bpf_xdp_sock_is_valid_access(int off, int size, enum bpf_access_type type, - struct bpf_insn_access_aux *info) -{ - if (off < 0 || off >= offsetofend(struct bpf_xdp_sock, queue_id)) - return false; - - if (off % size != 0) - return false; - - switch (off) { - default: - return size == sizeof(__u32); - } -} - -u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type, - const struct bpf_insn *si, - struct bpf_insn *insn_buf, - struct bpf_prog *prog, u32 *target_size) -{ - struct bpf_insn *insn = insn_buf; - -#define BPF_XDP_SOCK_GET(FIELD) \ - do { \ - BUILD_BUG_ON(sizeof_field(struct xdp_sock, FIELD) > \ - sizeof_field(struct bpf_xdp_sock, FIELD)); \ - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_sock, FIELD),\ - si->dst_reg, si->src_reg, \ - offsetof(struct xdp_sock, FIELD)); \ - } while (0) - - switch (si->off) { - case offsetof(struct bpf_xdp_sock, queue_id): - BPF_XDP_SOCK_GET(queue_id); - break; - } - - return insn - insn_buf; -} - static const struct bpf_func_proto bpf_skb_ecn_set_ce_proto = { .func = bpf_skb_ecn_set_ce, .gpl_only = false, @@ -7569,12 +6845,10 @@ bool bpf_helper_changes_pkt_data(void *func) func == bpf_clone_redirect || func == bpf_l3_csum_replace || func == bpf_l4_csum_replace || - func == bpf_xdp_adjust_head || - func == bpf_xdp_adjust_meta || + xdp_helper_changes_pkt_data(func) || func == bpf_msg_pull_data || func == bpf_msg_push_data || func == bpf_msg_pop_data || - func == bpf_xdp_adjust_tail || #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) func == bpf_lwt_seg6_store_bytes || func == bpf_lwt_seg6_adjust_srh || @@ -7929,32 +7203,11 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } } -static const struct bpf_func_proto * -xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +const struct bpf_func_proto *xdp_inet_func_proto(enum bpf_func_id func_id) { switch (func_id) { - case BPF_FUNC_perf_event_output: - return &bpf_xdp_event_output_proto; - case BPF_FUNC_get_smp_processor_id: - return &bpf_get_smp_processor_id_proto; case BPF_FUNC_csum_diff: return &bpf_csum_diff_proto; - case BPF_FUNC_xdp_adjust_head: - return &bpf_xdp_adjust_head_proto; - case BPF_FUNC_xdp_adjust_meta: - return &bpf_xdp_adjust_meta_proto; - case BPF_FUNC_redirect: - return &bpf_xdp_redirect_proto; - case BPF_FUNC_redirect_map: - return &bpf_xdp_redirect_map_proto; - case BPF_FUNC_xdp_adjust_tail: - return &bpf_xdp_adjust_tail_proto; - case BPF_FUNC_xdp_get_buff_len: - return &bpf_xdp_get_buff_len_proto; - case BPF_FUNC_xdp_load_bytes: - return &bpf_xdp_load_bytes_proto; - case BPF_FUNC_xdp_store_bytes: - return &bpf_xdp_store_bytes_proto; case BPF_FUNC_fib_lookup: return &bpf_xdp_fib_lookup_proto; case BPF_FUNC_check_mtu: @@ -8643,64 +7896,6 @@ static bool tc_cls_act_is_valid_access(int off, int size, return bpf_skb_is_valid_access(off, size, type, prog, info); } -static bool __is_valid_xdp_access(int off, int size) -{ - if (off < 0 || off >= sizeof(struct xdp_md)) - return false; - if (off % size != 0) - return false; - if (size != sizeof(__u32)) - return false; - - return true; -} - -static bool xdp_is_valid_access(int off, int size, - enum bpf_access_type type, - const struct bpf_prog *prog, - struct bpf_insn_access_aux *info) -{ - if (prog->expected_attach_type != BPF_XDP_DEVMAP) { - switch (off) { - case offsetof(struct xdp_md, egress_ifindex): - return false; - } - } - - if (type == BPF_WRITE) { - if (bpf_prog_is_dev_bound(prog->aux)) { - switch (off) { - case offsetof(struct xdp_md, rx_queue_index): - return __is_valid_xdp_access(off, size); - } - } - return false; - } - - switch (off) { - case offsetof(struct xdp_md, data): - info->reg_type = PTR_TO_PACKET; - break; - case offsetof(struct xdp_md, data_meta): - info->reg_type = PTR_TO_PACKET_META; - break; - case offsetof(struct xdp_md, data_end): - info->reg_type = PTR_TO_PACKET_END; - break; - } - - return __is_valid_xdp_access(off, size); -} - -void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act) -{ - const u32 act_max = XDP_REDIRECT; - - pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n", - act > act_max ? "Illegal" : "Driver unsupported", - act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A"); -} -EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); static bool sock_addr_is_valid_access(int off, int size, enum bpf_access_type type, @@ -9705,62 +8900,6 @@ static u32 tc_cls_act_convert_ctx_access(enum bpf_access_type type, return insn - insn_buf; } -static u32 xdp_convert_ctx_access(enum bpf_access_type type, - const struct bpf_insn *si, - struct bpf_insn *insn_buf, - struct bpf_prog *prog, u32 *target_size) -{ - struct bpf_insn *insn = insn_buf; - - switch (si->off) { - case offsetof(struct xdp_md, data): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, data)); - break; - case offsetof(struct xdp_md, data_meta): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_meta), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, data_meta)); - break; - case offsetof(struct xdp_md, data_end): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_end), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, data_end)); - break; - case offsetof(struct xdp_md, ingress_ifindex): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, rxq)); - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev), - si->dst_reg, si->dst_reg, - offsetof(struct xdp_rxq_info, dev)); - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, - offsetof(struct net_device, ifindex)); - break; - case offsetof(struct xdp_md, rx_queue_index): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, rxq)); - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, - offsetof(struct xdp_rxq_info, - queue_index)); - break; - case offsetof(struct xdp_md, egress_ifindex): - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq), - si->dst_reg, si->src_reg, - offsetof(struct xdp_buff, txq)); - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev), - si->dst_reg, si->dst_reg, - offsetof(struct xdp_txq_info, dev)); - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, - offsetof(struct net_device, ifindex)); - break; - } - - return insn - insn_buf; -} - /* SOCK_ADDR_LOAD_NESTED_FIELD() loads Nested Field S.F.NF where S is type of * context Structure, F is Field in context structure that contains a pointer * to Nested Structure of type NS that has the field NF. @@ -10602,17 +9741,6 @@ const struct bpf_prog_ops tc_cls_act_prog_ops = { .test_run = bpf_prog_test_run_skb, }; -const struct bpf_verifier_ops xdp_verifier_ops = { - .get_func_proto = xdp_func_proto, - .is_valid_access = xdp_is_valid_access, - .convert_ctx_access = xdp_convert_ctx_access, - .gen_prologue = bpf_noop_prologue, -}; - -const struct bpf_prog_ops xdp_prog_ops = { - .test_run = bpf_prog_test_run_xdp, -}; - const struct bpf_verifier_ops cg_skb_verifier_ops = { .get_func_proto = cg_skb_func_proto, .is_valid_access = cg_skb_is_valid_access, @@ -11266,13 +10394,6 @@ const struct bpf_verifier_ops sk_lookup_verifier_ops = { #endif /* CONFIG_INET */ -DEFINE_BPF_DISPATCHER(xdp) - -void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog) -{ - bpf_dispatcher_change_prog(BPF_DISPATCHER_PTR(xdp), prev_prog, prog); -} - BTF_ID_LIST_GLOBAL(btf_sock_ids, MAX_BTF_SOCK_TYPE) #define BTF_SOCK_TYPE(name, type) BTF_ID(struct, type) BTF_SOCK_TYPE_xxx From patchwork Tue Jun 28 19:47:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898839 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2294C433EF for ; Tue, 28 Jun 2022 19:51:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232574AbiF1TvF (ORCPT ); Tue, 28 Jun 2022 15:51:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229571AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00C343A70F; Tue, 28 Jun 2022 12:49:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445749; x=1687981749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sU25U5z5Fc524fPGWGwlzdwktQUtdwxy/KN7ZKjuDgg=; b=mq8Q9LSUF6oCyxKAiCE/RoEeJVrMWv4z20yJDeMLi721w6LSj4xiotIy r+lewqVNNKqXZiuSi6UTDwCKanerJWk1qMbGdGLareZFKNiSL/mhBCXvs xhVAh1KeqzrG7xjvI9pc2oeuX+z/WsqeH3nzecT4TU7P8/COQmfbCZmZz z9nkE/hx/V1I41R63Rc0I2y+rpnTbDvpxRqz5qEhcZHj81HIgSfm0c387 FD/TweloT9rueCIu/A97hXz1+U5HgIRN6sqcAyLbpth/dDj46D4PAqkOv O5Dwnz5tK3yCMOxsCowmleIoTAM4AS0+/wtusBWLfp7GqMyjC4dhUWaGX A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568051" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568051" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="590426284" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga002.jf.intel.com with ESMTP; 28 Jun 2022 12:49:03 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr96022013; Tue, 28 Jun 2022 20:49:02 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 06/52] bpf: pass a pointer to union bpf_attr to bpf_link_ops::update_prog() Date: Tue, 28 Jun 2022 21:47:26 +0200 Message-Id: <20220628194812.1453059-7-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In order to be able to use any arbitrary data from bpf_attr::link_update inside the bpf_link_ops::update_prog() implementations, pass a pointer to the whole attr as a callback argument. @new_prog and @old_prog arguments are still here as ::link_update contains only their FDs. Signed-off-by: Alexander Lobakin --- include/linux/bpf.h | 3 ++- kernel/bpf/bpf_iter.c | 1 + kernel/bpf/cgroup.c | 4 +++- kernel/bpf/net_namespace.c | 1 + kernel/bpf/syscall.c | 2 +- net/bpf/dev.c | 4 +++- 6 files changed, 11 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d05e1495a06e..c08690a49011 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1155,7 +1155,8 @@ struct bpf_link_ops { void (*release)(struct bpf_link *link); void (*dealloc)(struct bpf_link *link); int (*detach)(struct bpf_link *link); - int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, + int (*update_prog)(struct bpf_link *link, const union bpf_attr *attr, + struct bpf_prog *new_prog, struct bpf_prog *old_prog); void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq); int (*fill_link_info)(const struct bpf_link *link, diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 7e8fd49406f6..1d3dcc853f70 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -400,6 +400,7 @@ static void bpf_iter_link_dealloc(struct bpf_link *link) } static int bpf_iter_link_replace(struct bpf_link *link, + const union bpf_attr *attr, struct bpf_prog *new_prog, struct bpf_prog *old_prog) { diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 7a394f7c205c..f4d8100dd22f 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -664,7 +664,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp, return 0; } -static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog, +static int cgroup_bpf_replace(struct bpf_link *link, + const union bpf_attr *attr, + struct bpf_prog *new_prog, struct bpf_prog *old_prog) { struct bpf_cgroup_link *cg_link; diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c index 868cc2c43899..5d80a4a9d0bd 100644 --- a/kernel/bpf/net_namespace.c +++ b/kernel/bpf/net_namespace.c @@ -162,6 +162,7 @@ static void bpf_netns_link_dealloc(struct bpf_link *link) } static int bpf_netns_link_update_prog(struct bpf_link *link, + const union bpf_attr *attr, struct bpf_prog *new_prog, struct bpf_prog *old_prog) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7d5af5b99f0d..f7a674656067 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4614,7 +4614,7 @@ static int link_update(union bpf_attr *attr) } if (link->ops->update_prog) - ret = link->ops->update_prog(link, new_prog, old_prog); + ret = link->ops->update_prog(link, attr, new_prog, old_prog); else ret = -EINVAL; diff --git a/net/bpf/dev.c b/net/bpf/dev.c index dfe0402947f8..68a7b2c49392 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -619,7 +619,9 @@ static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, return 0; } -static int bpf_xdp_link_update(struct bpf_link *link, struct bpf_prog *new_prog, +static int bpf_xdp_link_update(struct bpf_link *link, + const union bpf_attr *attr, + struct bpf_prog *new_prog, struct bpf_prog *old_prog) { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); From patchwork Tue Jun 28 19:47:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898847 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C087C43334 for ; Tue, 28 Jun 2022 19:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230307AbiF1Tv1 (ORCPT ); Tue, 28 Jun 2022 15:51:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230118AbiF1Tuv (ORCPT ); Tue, 28 Jun 2022 15:50:51 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A8353A713; Tue, 28 Jun 2022 12:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445757; x=1687981757; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UA7ojFhtLpSWs3CIAGleRKPCFOtCm25p70d35QFcy9U=; b=ZIPZUwJrQuf24f83SkKWjkmKrYpt9BlkXkmmNTKV4IuMaIkALTxSFV59 1qwn9rIIkhMnc18ePfHo/9THxKnh8TMTvyXoO//3fZ2EUWxR7VGafYe1R 5qeDise7WSAKolsgpaIFIJ7Pei9k1MM8ytBXX7vnhOJs+usYVGLLZzK/t PG//VZ+4ghSrxVJivWg0/p0MC9efp9fAVEOogI8t2vcZd/byoWAvKHZEf Iobr8zJRd7MBCDIVfQMFTZv/Fga1Nte4o1AQUDBrNJ9TQVPPXcoyo3K9Y NN9fuIQkkivwCyz1PuDEC4O1BNjAGqkJkDsSpprlaS0LahZF8lNRm8Ea2 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280595783" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280595783" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="594927476" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2022 12:49:05 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr97022013; Tue, 28 Jun 2022 20:49:03 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 07/52] net, xdp: remove redundant arguments from dev_xdp_{at,de}tach_link() Date: Tue, 28 Jun 2022 21:47:27 +0200 Message-Id: <20220628194812.1453059-8-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC dev_xdp_attach_link(): the sole caller always passes %NULL as @extack and @link->dev as @dev, so they both can be omitted. The very same story with dev_xdp_detach_link(): remove both @dev and @extack as they both can be obtained inside the function itself. This decreases stack usage with no functional changes. Signed-off-by: Alexander Lobakin --- net/bpf/dev.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/bpf/dev.c b/net/bpf/dev.c index 68a7b2c49392..0010b20719e8 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -534,17 +534,14 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack return 0; } -static int dev_xdp_attach_link(struct net_device *dev, - struct netlink_ext_ack *extack, - struct bpf_xdp_link *link) +static int dev_xdp_attach_link(struct bpf_xdp_link *link) { - return dev_xdp_attach(dev, extack, link, NULL, NULL, link->flags); + return dev_xdp_attach(link->dev, NULL, link, NULL, NULL, link->flags); } -static int dev_xdp_detach_link(struct net_device *dev, - struct netlink_ext_ack *extack, - struct bpf_xdp_link *link) +static int dev_xdp_detach_link(struct bpf_xdp_link *link) { + struct net_device *dev = link->dev; enum bpf_xdp_mode mode; bpf_op_t bpf_op; @@ -570,7 +567,7 @@ static void bpf_xdp_link_release(struct bpf_link *link) * already NULL, in which case link was already auto-detached */ if (xdp_link->dev) { - WARN_ON(dev_xdp_detach_link(xdp_link->dev, NULL, xdp_link)); + WARN_ON(dev_xdp_detach_link(xdp_link)); xdp_link->dev = NULL; } @@ -709,7 +706,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) goto unlock; } - err = dev_xdp_attach_link(dev, NULL, link); + err = dev_xdp_attach_link(link); rtnl_unlock(); if (err) { From patchwork Tue Jun 28 19:47:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898840 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C15C3CCA47F for ; Tue, 28 Jun 2022 19:51:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232655AbiF1TvK (ORCPT ); Tue, 28 Jun 2022 15:51:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231398AbiF1Tut (ORCPT ); Tue, 28 Jun 2022 15:50:49 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A444EA7; Tue, 28 Jun 2022 12:49:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445751; x=1687981751; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fLcsoB6VtjEUPeJ9dvKqgy5Y7I+Rr4vHLG6qaoWyTqw=; b=UvHRfJQLm2wKmvyZRwbRT9mjg2nrLe5aJje+ZoxrQbEsAJx8mEPVMgw/ Onp9uMlxNJDs2pkzP7BuDmXPWNOVPZeZbH7cR6iKwdyC6guf+qGZDtwmp /jZtzOx/LS2cXZDyftWxALH+LcNCqa39RoOVKqyrSudWWbSL0XMlgp2r6 GiDBbP0h5EfCNm1hgSijUviVb2q7a5doxyppZt11CyZSBrY/OujcY2Fzf RxJY92cVEwaZVWLfbRUW5VVpxSJW6bw70SBwpxBTvmJwGFWSL24gFsmyR j3RubVWqmdpwMrEA8SbNyqeMXaszP9KB3161KXEHDlj+dxa3r02uOwtQN A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="264874035" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="264874035" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="693250940" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga002.fm.intel.com with ESMTP; 28 Jun 2022 12:49:06 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr98022013; Tue, 28 Jun 2022 20:49:04 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 08/52] net, xdp: factor out XDP install arguments to a separate structure Date: Tue, 28 Jun 2022 21:47:28 +0200 Message-Id: <20220628194812.1453059-9-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC The current way of passing parameters from userland/rtnetlink (do_set_link()) to dev_change_xdp_fd() and in the end to the drivers separately does not scale a lot: each new parameter/argument requires changing the prototypes of several functions at once. To be able to pass more, derive them into a structure which for now will contain: * dev, the actual netdevice, * extack, Netlink extack to pass arbitrary messages to userland, * flags, XDP install flags passed from the user. and use it in the following functions instead of the separate arguments: dev_change_xdp_fd(), dev_xdp_attach() and dev_xdp_install(). Adjust the rest accordingly. Those three are being used in the whole chain 'user -> driver', the rest can {,dis}appear later, thus not included. Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 10 +++++-- net/bpf/dev.c | 61 ++++++++++++++++++++++++--------------- net/core/rtnetlink.c | 10 +++++-- 3 files changed, 53 insertions(+), 28 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0b8169c23f22..1e342c285f48 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3848,11 +3848,17 @@ struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *d struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); +struct xdp_install_args { + struct net_device *dev; + struct netlink_ext_ack *extack; + u32 flags; +}; + DECLARE_STATIC_KEY_FALSE(generic_xdp_needed_key); int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); -int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, - int fd, int expected_fd, u32 flags); +int dev_change_xdp_fd(const struct xdp_install_args *args, int fd, + int expected_fd); void dev_xdp_uninstall(struct net_device *dev); u8 dev_xdp_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); diff --git a/net/bpf/dev.c b/net/bpf/dev.c index 0010b20719e8..7df42bb886ad 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -350,17 +350,17 @@ static void dev_xdp_set_prog(struct net_device *dev, enum bpf_xdp_mode mode, dev->xdp_state[mode].prog = prog; } -static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, - bpf_op_t bpf_op, struct netlink_ext_ack *extack, - u32 flags, struct bpf_prog *prog) +static int dev_xdp_install(const struct xdp_install_args *args, + enum bpf_xdp_mode mode, bpf_op_t bpf_op, + struct bpf_prog *prog) { struct netdev_bpf xdp; int err; memset(&xdp, 0, sizeof(xdp)); xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW : XDP_SETUP_PROG; - xdp.extack = extack; - xdp.flags = flags; + xdp.extack = args->extack; + xdp.flags = args->flags; xdp.prog = prog; /* Drivers assume refcnt is already incremented (i.e, prog pointer is @@ -371,7 +371,7 @@ static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, */ if (prog) bpf_prog_inc(prog); - err = bpf_op(dev, &xdp); + err = bpf_op(args->dev, &xdp); if (err) { if (prog) bpf_prog_put(prog); @@ -379,13 +379,16 @@ static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, } if (mode != XDP_MODE_HW) - bpf_prog_change_xdp(dev_xdp_prog(dev, mode), prog); + bpf_prog_change_xdp(dev_xdp_prog(args->dev, mode), prog); return 0; } void dev_xdp_uninstall(struct net_device *dev) { + struct xdp_install_args args = { + .dev = dev, + }; struct bpf_xdp_link *link; struct bpf_prog *prog; enum bpf_xdp_mode mode; @@ -402,7 +405,7 @@ void dev_xdp_uninstall(struct net_device *dev) if (!bpf_op) continue; - WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); + WARN_ON(dev_xdp_install(&args, mode, bpf_op, NULL)); /* auto-detach link from net device */ link = dev_xdp_link(dev, mode); @@ -415,13 +418,16 @@ void dev_xdp_uninstall(struct net_device *dev) } } -static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack, +static int dev_xdp_attach(const struct xdp_install_args *args, struct bpf_xdp_link *link, struct bpf_prog *new_prog, - struct bpf_prog *old_prog, u32 flags) + struct bpf_prog *old_prog) { - unsigned int num_modes = hweight32(flags & XDP_FLAGS_MODES); + unsigned int num_modes = hweight32(args->flags & XDP_FLAGS_MODES); + struct netlink_ext_ack *extack = args->extack; + struct net_device *dev = args->dev; struct bpf_prog *cur_prog; struct net_device *upper; + u32 flags = args->flags; struct list_head *iter; enum bpf_xdp_mode mode; bpf_op_t bpf_op; @@ -519,7 +525,7 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack return -EOPNOTSUPP; } - err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog); + err = dev_xdp_install(args, mode, bpf_op, new_prog); if (err) return err; } @@ -536,12 +542,20 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack static int dev_xdp_attach_link(struct bpf_xdp_link *link) { - return dev_xdp_attach(link->dev, NULL, link, NULL, NULL, link->flags); + struct xdp_install_args args = { + .dev = link->dev, + .flags = link->flags, + }; + + return dev_xdp_attach(&args, link, NULL, NULL); } static int dev_xdp_detach_link(struct bpf_xdp_link *link) { struct net_device *dev = link->dev; + struct xdp_install_args args = { + .dev = dev, + }; enum bpf_xdp_mode mode; bpf_op_t bpf_op; @@ -552,7 +566,7 @@ static int dev_xdp_detach_link(struct bpf_xdp_link *link) return -EINVAL; bpf_op = dev_xdp_bpf_op(dev, mode); - WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL)); + WARN_ON(dev_xdp_install(&args, mode, bpf_op, NULL)); dev_xdp_set_link(dev, mode, NULL); return 0; } @@ -622,6 +636,10 @@ static int bpf_xdp_link_update(struct bpf_link *link, struct bpf_prog *old_prog) { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); + struct xdp_install_args args = { + .dev = xdp_link->dev, + .flags = xdp_link->flags, + }; enum bpf_xdp_mode mode; bpf_op_t bpf_op; int err = 0; @@ -653,8 +671,7 @@ static int bpf_xdp_link_update(struct bpf_link *link, mode = dev_xdp_mode(xdp_link->dev, xdp_link->flags); bpf_op = dev_xdp_bpf_op(xdp_link->dev, mode); - err = dev_xdp_install(xdp_link->dev, mode, bpf_op, NULL, - xdp_link->flags, new_prog); + err = dev_xdp_install(&args, mode, bpf_op, new_prog); if (err) goto out_unlock; @@ -730,18 +747,16 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) /** * dev_change_xdp_fd - set or clear a bpf program for a device rx path - * @dev: device - * @extack: netlink extended ack + * @args: common XDP arguments (device, extended ack, flags etc.) * @fd: new program fd or negative value to clear * @expected_fd: old program fd that userspace expects to replace or clear - * @flags: xdp-related flags * * Set or clear a bpf program for a device */ -int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, - int fd, int expected_fd, u32 flags) +int dev_change_xdp_fd(const struct xdp_install_args *args, int fd, + int expected_fd) { - enum bpf_xdp_mode mode = dev_xdp_mode(dev, flags); + enum bpf_xdp_mode mode = dev_xdp_mode(args->dev, args->flags); struct bpf_prog *new_prog = NULL, *old_prog = NULL; int err; @@ -764,7 +779,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, } } - err = dev_xdp_attach(dev, extack, NULL, new_prog, old_prog, flags); + err = dev_xdp_attach(args, NULL, new_prog, old_prog); err_out: if (err && new_prog) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index ac45328607f7..5b06ded689b2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2987,6 +2987,11 @@ static int do_setlink(const struct sk_buff *skb, } if (xdp[IFLA_XDP_FD]) { + struct xdp_install_args args = { + .dev = dev, + .extack = extack, + .flags = xdp_flags, + }; int expected_fd = -1; if (xdp_flags & XDP_FLAGS_REPLACE) { @@ -2998,10 +3003,9 @@ static int do_setlink(const struct sk_buff *skb, nla_get_s32(xdp[IFLA_XDP_EXPECTED_FD]); } - err = dev_change_xdp_fd(dev, extack, + err = dev_change_xdp_fd(&args, nla_get_s32(xdp[IFLA_XDP_FD]), - expected_fd, - xdp_flags); + expected_fd); if (err) goto errout; status |= DO_SETLINK_NOTIFY; From patchwork Tue Jun 28 19:47:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898843 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2446C433EF for ; Tue, 28 Jun 2022 19:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbiF1TvP (ORCPT ); Tue, 28 Jun 2022 15:51:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbiF1Tuu (ORCPT ); Tue, 28 Jun 2022 15:50:50 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E6AC17E3F; Tue, 28 Jun 2022 12:49:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445752; x=1687981752; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SqWbOCdnhWLJ6BpjYS5/roLulPLkxRjJFV89S0QmGTs=; b=f6B+pJGrIU4mqL40TbHGKn3zn85qew+G1Z9pDz14rNBJandV19G8eApk Z2Y+r0Vk/NHrnTGheCBOzVJPYqSNdCzW/0A1BuvKEPgTPisQ2fpLr4O/7 AOv9yi0JfGtXTz7vkCRXDyy/MP64EqJFOvxEpguvvbdulwad1QgSYizcU y8ogWlEFiUvmfEu6JPaycpPwPU/Ghs9EK76+9tXUyWNkJbIH9YY+wOrjZ +KRfUt5/QXm60/aXTC8WDykV8vcIZ3wP8jDyuzhEvYw9Hmh4hhLRJ8YtH sk9wjOAwNvZvMpRWrgHwhZOgNoI5irPZ+1xM1KKIg2JK1g2DR6/2l+zih g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="264874045" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="264874045" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="693250944" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga002.fm.intel.com with ESMTP; 28 Jun 2022 12:49:07 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr99022013; Tue, 28 Jun 2022 20:49:06 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 09/52] net, xdp: add ability to specify BTF ID for XDP metadata Date: Tue, 28 Jun 2022 21:47:29 +0200 Message-Id: <20220628194812.1453059-10-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add the UAPI and the corresponding kernel part to be able to specify the BTF ID of the format which the drivers should compose metadata in (if supported). A driver might be able to provide XDP metadata in different formats, e.g. the generic one and one or several custom (with some non-universal data from DMA descriptors etc.). In this case, a BPF loader program will specify the wanted BTF ID and then BPF and AF_XDP programs will be expecting this format in XDP metadata and will be comparing different BTF IDs against the one that will be put in front of a frame. The BTF ID can be set and updated via both BPF link and rtnetlink (the %IFLA_XDP_BTF_ID attribute) interfaces, got via &bpf_link_info and is being passed to the drivers inside &netdev_bpf. net_device_ops::ndo_bpf() is now being called not only when @new_prog != @old_prog, but also when @new_prog == @old_prog && @new_btf_id != @btf_id, so the drivers should be able to handle such cases. Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 2 ++ include/net/xdp.h | 1 + include/uapi/linux/bpf.h | 12 ++++++++++++ include/uapi/linux/if_link.h | 1 + kernel/bpf/syscall.c | 2 +- net/bpf/core.c | 1 + net/bpf/dev.c | 26 +++++++++++++++++++++++--- net/core/rtnetlink.c | 6 ++++++ tools/include/uapi/linux/bpf.h | 12 ++++++++++++ tools/include/uapi/linux/if_link.h | 1 + 10 files changed, 60 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1e342c285f48..2218c1901daf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -985,6 +985,7 @@ struct netdev_bpf { /* XDP_SETUP_PROG */ struct { u32 flags; + u64 btf_id; struct bpf_prog *prog; struct netlink_ext_ack *extack; }; @@ -3852,6 +3853,7 @@ struct xdp_install_args { struct net_device *dev; struct netlink_ext_ack *extack; u32 flags; + u64 btf_id; }; DECLARE_STATIC_KEY_FALSE(generic_xdp_needed_key); diff --git a/include/net/xdp.h b/include/net/xdp.h index 04c852c7a77f..13133c7493bc 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -400,6 +400,7 @@ static inline bool xdp_metalen_invalid(unsigned long metalen) struct xdp_attachment_info { struct bpf_prog *prog; + u64 btf_id; u32 flags; }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e81362891596..c67ddb78915d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1499,6 +1499,10 @@ union bpf_attr { */ __u64 cookie; } tracing; + struct { + /* target metadata BTF + type ID */ + __aligned_u64 btf_id; + } xdp; }; } link_create; @@ -1510,6 +1514,12 @@ union bpf_attr { /* expected link's program fd; is specified only if * BPF_F_REPLACE flag is set in flags */ __u32 old_prog_fd; + union { + struct { + /* new target metadata BTF + type ID */ + __aligned_u64 new_btf_id; + } xdp; + }; } link_update; struct { @@ -6138,6 +6148,8 @@ struct bpf_link_info { } netns; struct { __u32 ifindex; + __u32 :32; + __aligned_u64 btf_id; } xdp; }; } __attribute__((aligned(8))); diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 5f58dcfe2787..73cdcc86875e 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1307,6 +1307,7 @@ enum { IFLA_XDP_SKB_PROG_ID, IFLA_XDP_HW_PROG_ID, IFLA_XDP_EXPECTED_FD, + IFLA_XDP_BTF_ID, __IFLA_XDP_MAX, }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f7a674656067..2e86cfeae10f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4575,7 +4575,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) return ret; } -#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd +#define BPF_LINK_UPDATE_LAST_FIELD link_update.xdp.new_btf_id static int link_update(union bpf_attr *attr) { diff --git a/net/bpf/core.c b/net/bpf/core.c index fbb72792320a..e5abd5a64df7 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -552,6 +552,7 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, if (info->prog) bpf_prog_put(info->prog); info->prog = bpf->prog; + info->btf_id = bpf->btf_id; info->flags = bpf->flags; } EXPORT_SYMBOL_GPL(xdp_attachment_setup); diff --git a/net/bpf/dev.c b/net/bpf/dev.c index 7df42bb886ad..e96986220126 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -273,6 +273,7 @@ struct bpf_xdp_link { struct bpf_link link; struct net_device *dev; /* protected by rtnl_lock, no refcnt held */ int flags; + u64 btf_id; }; typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -357,8 +358,13 @@ static int dev_xdp_install(const struct xdp_install_args *args, struct netdev_bpf xdp; int err; + /* BTF ID must not be set when uninstalling the program */ + if (!prog && args->btf_id) + return -EINVAL; + memset(&xdp, 0, sizeof(xdp)); xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW : XDP_SETUP_PROG; + xdp.btf_id = args->btf_id; xdp.extack = args->extack; xdp.flags = args->flags; xdp.prog = prog; @@ -517,8 +523,11 @@ static int dev_xdp_attach(const struct xdp_install_args *args, } } - /* don't call drivers if the effective program didn't change */ - if (new_prog != cur_prog) { + /* don't call drivers if the effective program or BTF ID didn't change. + * If @link == %NULL, we don't know the old value, so the only thing we + * can do is to call installing unconditionally + */ + if (new_prog != cur_prog || !link || args->btf_id != link->btf_id) { bpf_op = dev_xdp_bpf_op(dev, mode); if (!bpf_op) { NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode"); @@ -545,6 +554,7 @@ static int dev_xdp_attach_link(struct bpf_xdp_link *link) struct xdp_install_args args = { .dev = link->dev, .flags = link->flags, + .btf_id = link->btf_id, }; return dev_xdp_attach(&args, link, NULL, NULL); @@ -606,13 +616,16 @@ static void bpf_xdp_link_show_fdinfo(const struct bpf_link *link, { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); u32 ifindex = 0; + u64 btf_id; rtnl_lock(); if (xdp_link->dev) ifindex = xdp_link->dev->ifindex; + btf_id = xdp_link->btf_id; rtnl_unlock(); seq_printf(seq, "ifindex:\t%u\n", ifindex); + seq_printf(seq, "btf_id:\t0x%llx\n", btf_id); } static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, @@ -620,13 +633,16 @@ static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); u32 ifindex = 0; + u64 btf_id; rtnl_lock(); if (xdp_link->dev) ifindex = xdp_link->dev->ifindex; + btf_id = xdp_link->btf_id; rtnl_unlock(); info->xdp.ifindex = ifindex; + info->xdp.btf_id = btf_id; return 0; } @@ -639,6 +655,7 @@ static int bpf_xdp_link_update(struct bpf_link *link, struct xdp_install_args args = { .dev = xdp_link->dev, .flags = xdp_link->flags, + .btf_id = attr->link_update.xdp.new_btf_id, }; enum bpf_xdp_mode mode; bpf_op_t bpf_op; @@ -663,7 +680,7 @@ static int bpf_xdp_link_update(struct bpf_link *link, goto out_unlock; } - if (old_prog == new_prog) { + if (old_prog == new_prog && args.btf_id == xdp_link->btf_id) { /* no-op, don't disturb drivers */ bpf_prog_put(new_prog); goto out_unlock; @@ -678,6 +695,8 @@ static int bpf_xdp_link_update(struct bpf_link *link, old_prog = xchg(&link->prog, new_prog); bpf_prog_put(old_prog); + xdp_link->btf_id = args.btf_id; + out_unlock: rtnl_unlock(); return err; @@ -716,6 +735,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) bpf_link_init(&link->link, BPF_LINK_TYPE_XDP, &bpf_xdp_link_lops, prog); link->dev = dev; link->flags = attr->link_create.flags; + link->btf_id = attr->link_create.xdp.btf_id; err = bpf_link_prime(&link->link, &link_primer); if (err) { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5b06ded689b2..a30723b0e50c 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1979,6 +1979,7 @@ static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = { [IFLA_XDP_ATTACHED] = { .type = NLA_U8 }, [IFLA_XDP_FLAGS] = { .type = NLA_U32 }, [IFLA_XDP_PROG_ID] = { .type = NLA_U32 }, + [IFLA_XDP_BTF_ID] = { .type = NLA_U64 }, }; static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla) @@ -2962,6 +2963,7 @@ static int do_setlink(const struct sk_buff *skb, if (tb[IFLA_XDP]) { struct nlattr *xdp[IFLA_XDP_MAX + 1]; u32 xdp_flags = 0; + u64 btf_id = 0; err = nla_parse_nested_deprecated(xdp, IFLA_XDP_MAX, tb[IFLA_XDP], @@ -2986,10 +2988,14 @@ static int do_setlink(const struct sk_buff *skb, } } + if (xdp[IFLA_XDP_BTF_ID]) + btf_id = nla_get_u64(xdp[IFLA_XDP_BTF_ID]); + if (xdp[IFLA_XDP_FD]) { struct xdp_install_args args = { .dev = dev, .extack = extack, + .btf_id = btf_id, .flags = xdp_flags, }; int expected_fd = -1; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index e81362891596..c67ddb78915d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1499,6 +1499,10 @@ union bpf_attr { */ __u64 cookie; } tracing; + struct { + /* target metadata BTF + type ID */ + __aligned_u64 btf_id; + } xdp; }; } link_create; @@ -1510,6 +1514,12 @@ union bpf_attr { /* expected link's program fd; is specified only if * BPF_F_REPLACE flag is set in flags */ __u32 old_prog_fd; + union { + struct { + /* new target metadata BTF + type ID */ + __aligned_u64 new_btf_id; + } xdp; + }; } link_update; struct { @@ -6138,6 +6148,8 @@ struct bpf_link_info { } netns; struct { __u32 ifindex; + __u32 :32; + __aligned_u64 btf_id; } xdp; }; } __attribute__((aligned(8))); diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h index b339bf2196ca..68b126678dc8 100644 --- a/tools/include/uapi/linux/if_link.h +++ b/tools/include/uapi/linux/if_link.h @@ -1212,6 +1212,7 @@ enum { IFLA_XDP_SKB_PROG_ID, IFLA_XDP_HW_PROG_ID, IFLA_XDP_EXPECTED_FD, + IFLA_XDP_BTF_ID, __IFLA_XDP_MAX, }; From patchwork Tue Jun 28 19:47:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898842 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D061CCA479 for ; Tue, 28 Jun 2022 19:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232692AbiF1TvM (ORCPT ); Tue, 28 Jun 2022 15:51:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230122AbiF1Tuu (ORCPT ); Tue, 28 Jun 2022 15:50:50 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A81337AA6; Tue, 28 Jun 2022 12:49:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445753; x=1687981753; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QuDromnuxNsGAITyqqK2ZtJCvPPIlNl9f1QlOMWHD4E=; b=nlU/grdtEkCOQC+lV5rw4xBzxhQHIgaiZhbeRJOK8AM+IjKGH4lplqh8 5wAdBpOG9SUfXYLG1DJL4SjkFdcZ9dK4/8M/CTEIo7VacJQmBCTpCbDOu eJVNmspghBp61TvpwpWiDuZyndT9sTjtaw5Odax57gGi7feBE09fHD6UF BQK/IX6CKOifKxSFGLrch5Xyi6Uafc3yB0bb0Mh8t1LCviGRT6ycT9n8w f85UW07m+U78qVjDe855wfile3BHYE+7KymnHzsbepuiN1ANvHnrwh75U pkWuwuaWMBxhoXVrztmc06OxJ6C1y/wB3iDkhxkBVdd/cMBjrtBcYPQLL g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="261635573" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="261635573" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="658257470" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga004.fm.intel.com with ESMTP; 28 Jun 2022 12:49:09 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9A022013; Tue, 28 Jun 2022 20:49:07 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 10/52] net, xdp: add ability to specify frame size threshold for XDP metadata Date: Tue, 28 Jun 2022 21:47:30 +0200 Message-Id: <20220628194812.1453059-11-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add the UAPI and the corresponding kernel part to be able to specify the frame size which the drivers should start composing metadata from (if supported). Instead of having just 1 bit on/off, have the possilibty to set the threshold for drivers to start composing meta. It helps with the situations when e.g. lots of traffic receive %XDP_DROP verdict without looking at the meta. In such cases, the performance on the small frames (< 96 bytes) can suffer by several Mpps with no benefits, so setting the threshold of 100-128 makes much sense. Setting it to 0 or 1 works just like a bitflag, values of 2-14 work like 1, values of SZ_16K+ works like 0. So, the logics in the drivers should be like: if (rx_desc->frame_size >= meta_thresh) compose_meta(); bpf_prog_run_xdp(); The threshold can be set and updated via both BPF link and rtnetlink (the %IFLA_XDP_META_THRESH attribute) interfaces, got via &bpf_link_info and is being passed to the drivers inside &netdev_bpf. net_device_ops::ndo_bpf() is now also being called when @new_prog == @old_prog && @new_btf_id == @btf_id && @new_meta_thresh != @meta_thresh. Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 2 ++ include/net/xdp.h | 1 + include/uapi/linux/bpf.h | 10 +++++++- include/uapi/linux/if_link.h | 1 + kernel/bpf/syscall.c | 2 +- net/bpf/core.c | 1 + net/bpf/dev.c | 38 +++++++++++++++++++++++------- net/core/rtnetlink.c | 6 +++++ tools/include/uapi/linux/bpf.h | 10 +++++++- tools/include/uapi/linux/if_link.h | 1 + 10 files changed, 60 insertions(+), 12 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2218c1901daf..bc2d82a3d0de 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -985,6 +985,7 @@ struct netdev_bpf { /* XDP_SETUP_PROG */ struct { u32 flags; + u32 meta_thresh; u64 btf_id; struct bpf_prog *prog; struct netlink_ext_ack *extack; @@ -3853,6 +3854,7 @@ struct xdp_install_args { struct net_device *dev; struct netlink_ext_ack *extack; u32 flags; + u32 meta_thresh; u64 btf_id; }; diff --git a/include/net/xdp.h b/include/net/xdp.h index 13133c7493bc..7b8ba068d28a 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -401,6 +401,7 @@ static inline bool xdp_metalen_invalid(unsigned long metalen) struct xdp_attachment_info { struct bpf_prog *prog; u64 btf_id; + u32 meta_thresh; u32 flags; }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c67ddb78915d..372170ded1d8 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1502,6 +1502,10 @@ union bpf_attr { struct { /* target metadata BTF + type ID */ __aligned_u64 btf_id; + /* frame size to start composing XDP + * metadata from + */ + __u32 meta_thresh; } xdp; }; } link_create; @@ -1518,6 +1522,10 @@ union bpf_attr { struct { /* new target metadata BTF + type ID */ __aligned_u64 new_btf_id; + /* new frame size to start composing XDP + * metadata from + */ + __u32 new_meta_thresh; } xdp; }; } link_update; @@ -6148,7 +6156,7 @@ struct bpf_link_info { } netns; struct { __u32 ifindex; - __u32 :32; + __u32 meta_thresh; __aligned_u64 btf_id; } xdp; }; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 73cdcc86875e..78b448ff1cb7 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1308,6 +1308,7 @@ enum { IFLA_XDP_HW_PROG_ID, IFLA_XDP_EXPECTED_FD, IFLA_XDP_BTF_ID, + IFLA_XDP_META_THRESH, __IFLA_XDP_MAX, }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 2e86cfeae10f..e1a56e62bdb4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4575,7 +4575,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) return ret; } -#define BPF_LINK_UPDATE_LAST_FIELD link_update.xdp.new_btf_id +#define BPF_LINK_UPDATE_LAST_FIELD link_update.xdp.new_meta_thresh static int link_update(union bpf_attr *attr) { diff --git a/net/bpf/core.c b/net/bpf/core.c index e5abd5a64df7..dcd3b6ae86b7 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -553,6 +553,7 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, bpf_prog_put(info->prog); info->prog = bpf->prog; info->btf_id = bpf->btf_id; + info->meta_thresh = bpf->meta_thresh; info->flags = bpf->flags; } EXPORT_SYMBOL_GPL(xdp_attachment_setup); diff --git a/net/bpf/dev.c b/net/bpf/dev.c index e96986220126..82948d0536c8 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -273,6 +273,7 @@ struct bpf_xdp_link { struct bpf_link link; struct net_device *dev; /* protected by rtnl_lock, no refcnt held */ int flags; + u32 meta_thresh; u64 btf_id; }; @@ -358,12 +359,20 @@ static int dev_xdp_install(const struct xdp_install_args *args, struct netdev_bpf xdp; int err; - /* BTF ID must not be set when uninstalling the program */ - if (!prog && args->btf_id) + /* Neither BTF ID nor meta threshold can be set when uninstalling + * the program + */ + if (!prog && (args->btf_id || args->meta_thresh)) + return -EINVAL; + + /* Both meta threshold and BTF ID must be either specified or not */ + if (!args->btf_id != !args->meta_thresh) return -EINVAL; memset(&xdp, 0, sizeof(xdp)); xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW : XDP_SETUP_PROG; + /* Convert 0 to "infitity" to allow plain >= comparison on hotpath */ + xdp.meta_thresh = args->meta_thresh ? : ~args->meta_thresh; xdp.btf_id = args->btf_id; xdp.extack = args->extack; xdp.flags = args->flags; @@ -523,11 +532,13 @@ static int dev_xdp_attach(const struct xdp_install_args *args, } } - /* don't call drivers if the effective program or BTF ID didn't change. - * If @link == %NULL, we don't know the old value, so the only thing we - * can do is to call installing unconditionally + /* don't call drivers if the effective program or BTF ID / metadata + * threshold didn't change. If @link == %NULL, we don't know the + * old values, so the only thing we can do is to call installing + * unconditionally */ - if (new_prog != cur_prog || !link || args->btf_id != link->btf_id) { + if (new_prog != cur_prog || !link || args->btf_id != link->btf_id || + args->meta_thresh != link->meta_thresh) { bpf_op = dev_xdp_bpf_op(dev, mode); if (!bpf_op) { NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode"); @@ -555,6 +566,7 @@ static int dev_xdp_attach_link(struct bpf_xdp_link *link) .dev = link->dev, .flags = link->flags, .btf_id = link->btf_id, + .meta_thresh = link->meta_thresh, }; return dev_xdp_attach(&args, link, NULL, NULL); @@ -615,16 +627,18 @@ static void bpf_xdp_link_show_fdinfo(const struct bpf_link *link, struct seq_file *seq) { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - u32 ifindex = 0; + u32 meta_thresh, ifindex = 0; u64 btf_id; rtnl_lock(); if (xdp_link->dev) ifindex = xdp_link->dev->ifindex; + meta_thresh = xdp_link->meta_thresh; btf_id = xdp_link->btf_id; rtnl_unlock(); seq_printf(seq, "ifindex:\t%u\n", ifindex); + seq_printf(seq, "meta_thresh:\t%u\n", meta_thresh); seq_printf(seq, "btf_id:\t0x%llx\n", btf_id); } @@ -632,17 +646,19 @@ static int bpf_xdp_link_fill_link_info(const struct bpf_link *link, struct bpf_link_info *info) { struct bpf_xdp_link *xdp_link = container_of(link, struct bpf_xdp_link, link); - u32 ifindex = 0; + u32 meta_thresh, ifindex = 0; u64 btf_id; rtnl_lock(); if (xdp_link->dev) ifindex = xdp_link->dev->ifindex; + meta_thresh = xdp_link->meta_thresh; btf_id = xdp_link->btf_id; rtnl_unlock(); info->xdp.ifindex = ifindex; info->xdp.btf_id = btf_id; + info->xdp.meta_thresh = meta_thresh; return 0; } @@ -656,6 +672,7 @@ static int bpf_xdp_link_update(struct bpf_link *link, .dev = xdp_link->dev, .flags = xdp_link->flags, .btf_id = attr->link_update.xdp.new_btf_id, + .meta_thresh = attr->link_update.xdp.new_meta_thresh, }; enum bpf_xdp_mode mode; bpf_op_t bpf_op; @@ -680,7 +697,8 @@ static int bpf_xdp_link_update(struct bpf_link *link, goto out_unlock; } - if (old_prog == new_prog && args.btf_id == xdp_link->btf_id) { + if (old_prog == new_prog && args.btf_id == xdp_link->btf_id && + args.meta_thresh == xdp_link->meta_thresh) { /* no-op, don't disturb drivers */ bpf_prog_put(new_prog); goto out_unlock; @@ -696,6 +714,7 @@ static int bpf_xdp_link_update(struct bpf_link *link, bpf_prog_put(old_prog); xdp_link->btf_id = args.btf_id; + xdp_link->meta_thresh = args.meta_thresh; out_unlock: rtnl_unlock(); @@ -736,6 +755,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) link->dev = dev; link->flags = attr->link_create.flags; link->btf_id = attr->link_create.xdp.btf_id; + link->meta_thresh = attr->link_create.xdp.meta_thresh; err = bpf_link_prime(&link->link, &link_primer); if (err) { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a30723b0e50c..500420d5017c 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1980,6 +1980,7 @@ static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = { [IFLA_XDP_FLAGS] = { .type = NLA_U32 }, [IFLA_XDP_PROG_ID] = { .type = NLA_U32 }, [IFLA_XDP_BTF_ID] = { .type = NLA_U64 }, + [IFLA_XDP_META_THRESH] = { .type = NLA_U32 }, }; static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla) @@ -2962,6 +2963,7 @@ static int do_setlink(const struct sk_buff *skb, if (tb[IFLA_XDP]) { struct nlattr *xdp[IFLA_XDP_MAX + 1]; + u32 meta_thresh = 0; u32 xdp_flags = 0; u64 btf_id = 0; @@ -2991,12 +2993,16 @@ static int do_setlink(const struct sk_buff *skb, if (xdp[IFLA_XDP_BTF_ID]) btf_id = nla_get_u64(xdp[IFLA_XDP_BTF_ID]); + if (xdp[IFLA_XDP_META_THRESH]) + meta_thresh = nla_get_u32(xdp[IFLA_XDP_META_THRESH]); + if (xdp[IFLA_XDP_FD]) { struct xdp_install_args args = { .dev = dev, .extack = extack, .btf_id = btf_id, .flags = xdp_flags, + .meta_thresh = meta_thresh, }; int expected_fd = -1; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index c67ddb78915d..372170ded1d8 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1502,6 +1502,10 @@ union bpf_attr { struct { /* target metadata BTF + type ID */ __aligned_u64 btf_id; + /* frame size to start composing XDP + * metadata from + */ + __u32 meta_thresh; } xdp; }; } link_create; @@ -1518,6 +1522,10 @@ union bpf_attr { struct { /* new target metadata BTF + type ID */ __aligned_u64 new_btf_id; + /* new frame size to start composing XDP + * metadata from + */ + __u32 new_meta_thresh; } xdp; }; } link_update; @@ -6148,7 +6156,7 @@ struct bpf_link_info { } netns; struct { __u32 ifindex; - __u32 :32; + __u32 meta_thresh; __aligned_u64 btf_id; } xdp; }; diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h index 68b126678dc8..b73b9c0f06fb 100644 --- a/tools/include/uapi/linux/if_link.h +++ b/tools/include/uapi/linux/if_link.h @@ -1213,6 +1213,7 @@ enum { IFLA_XDP_HW_PROG_ID, IFLA_XDP_EXPECTED_FD, IFLA_XDP_BTF_ID, + IFLA_XDP_META_THRESH, __IFLA_XDP_MAX, }; From patchwork Tue Jun 28 19:47:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898845 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E031ACCA47F for ; Tue, 28 Jun 2022 19:51:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230039AbiF1TvX (ORCPT ); Tue, 28 Jun 2022 15:51:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbiF1Tuu (ORCPT ); Tue, 28 Jun 2022 15:50:50 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F58A3A712; Tue, 28 Jun 2022 12:49:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445755; x=1687981755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KP8+M2ZNDj+bGLuKjD0DKVm/0e+VNysNArAnGfFt/hk=; b=VGZyeyQOA3ZQRhYZoUgIuckUwXOmqYutJY5+HNigN9K72rmhMp9INWeW Y+Bu/DjnkxucyGGLkc0SSrHB8RQ7um98prNg9NRbSolPPRu4AryaIQrKi /Glb9JT90jKwdiajo4fwKbKc4ASS0HAdC2wPv2wbwxrOr5bmH9BneQ2WU NdXMdeYR3dgkj3YethqmI+6aIKcvPylqTc/Rer+ImB0nUVjHLY95WJq0C 5IqTdyUcSn4B1hl7wHqYqeVgAIWxsCyJliW0kzfVX0gUUVFntzuraTPAs TTyOP9AMRcu+GD9M4mobUsNdkgQmF79tKrRC3AVxfZn1ELTURWH5T79vu Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568081" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568081" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="680182439" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by FMSMGA003.fm.intel.com with ESMTP; 28 Jun 2022 12:49:10 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9B022013; Tue, 28 Jun 2022 20:49:08 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 11/52] libbpf: factor out __bpf_set_link_xdp_fd_replace() args into a struct Date: Tue, 28 Jun 2022 21:47:31 +0200 Message-Id: <20220628194812.1453059-12-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Its argument list already consists of 4 entries, and there are more to be added. It's convenient to add new opts as they are already being passed using structs, but at the end the mentioned function take all the opts one by one. Place them into a local struct which will satisfy every initial call site, so it will be now a matter of adding a new field and a corresponding nlattr_add() to handle a new opt. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/netlink.c | 60 ++++++++++++++++++++++++++++------------- 1 file changed, 42 insertions(+), 18 deletions(-) diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index cbc8967d5402..3a25178d0d12 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -230,8 +230,15 @@ static int libbpf_netlink_send_recv(struct libbpf_nla_req *req, return ret; } -static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, - __u32 flags) +struct __bpf_set_link_xdp_fd_opts { + int ifindex; + int fd; + int old_fd; + __u32 flags; +}; + +static int +__bpf_set_link_xdp_fd_replace(const struct __bpf_set_link_xdp_fd_opts *opts) { struct nlattr *nla; int ret; @@ -242,22 +249,23 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; req.nh.nlmsg_type = RTM_SETLINK; req.ifinfo.ifi_family = AF_UNSPEC; - req.ifinfo.ifi_index = ifindex; + req.ifinfo.ifi_index = opts->ifindex; nla = nlattr_begin_nested(&req, IFLA_XDP); if (!nla) return -EMSGSIZE; - ret = nlattr_add(&req, IFLA_XDP_FD, &fd, sizeof(fd)); + ret = nlattr_add(&req, IFLA_XDP_FD, &opts->fd, sizeof(opts->fd)); if (ret < 0) return ret; - if (flags) { - ret = nlattr_add(&req, IFLA_XDP_FLAGS, &flags, sizeof(flags)); + if (opts->flags) { + ret = nlattr_add(&req, IFLA_XDP_FLAGS, &opts->flags, + sizeof(opts->flags)); if (ret < 0) return ret; } - if (flags & XDP_FLAGS_REPLACE) { - ret = nlattr_add(&req, IFLA_XDP_EXPECTED_FD, &old_fd, - sizeof(old_fd)); + if (opts->flags & XDP_FLAGS_REPLACE) { + ret = nlattr_add(&req, IFLA_XDP_EXPECTED_FD, &opts->old_fd, + sizeof(opts->old_fd)); if (ret < 0) return ret; } @@ -268,18 +276,23 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_attach_opts *opts) { - int old_prog_fd, err; + struct __bpf_set_link_xdp_fd_opts sl_opts = { + .ifindex = ifindex, + .flags = flags, + .fd = prog_fd, + }; + int err; if (!OPTS_VALID(opts, bpf_xdp_attach_opts)) return libbpf_err(-EINVAL); - old_prog_fd = OPTS_GET(opts, old_prog_fd, 0); - if (old_prog_fd) + sl_opts.old_fd = OPTS_GET(opts, old_prog_fd, 0); + if (sl_opts.old_fd) flags |= XDP_FLAGS_REPLACE; else - old_prog_fd = -1; + sl_opts.old_fd = -1; - err = __bpf_set_link_xdp_fd_replace(ifindex, prog_fd, old_prog_fd, flags); + err = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(err); } @@ -291,25 +304,36 @@ int bpf_xdp_detach(int ifindex, __u32 flags, const struct bpf_xdp_attach_opts *o int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags, const struct bpf_xdp_set_link_opts *opts) { - int old_fd = -1, ret; + struct __bpf_set_link_xdp_fd_opts sl_opts = { + .ifindex = ifindex, + .flags = flags, + .old_fd = -1, + .fd = fd, + }; + int ret; if (!OPTS_VALID(opts, bpf_xdp_set_link_opts)) return libbpf_err(-EINVAL); if (OPTS_HAS(opts, old_fd)) { - old_fd = OPTS_GET(opts, old_fd, -1); + sl_opts.old_fd = OPTS_GET(opts, old_fd, -1); flags |= XDP_FLAGS_REPLACE; } - ret = __bpf_set_link_xdp_fd_replace(ifindex, fd, old_fd, flags); + ret = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(ret); } int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags) { + struct __bpf_set_link_xdp_fd_opts sl_opts = { + .ifindex = ifindex, + .flags = flags, + .fd = fd, + }; int ret; - ret = __bpf_set_link_xdp_fd_replace(ifindex, fd, 0, flags); + ret = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(ret); } From patchwork Tue Jun 28 19:47:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898844 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9B14C433EF for ; Tue, 28 Jun 2022 19:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231150AbiF1TvU (ORCPT ); Tue, 28 Jun 2022 15:51:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232183AbiF1Tuu (ORCPT ); Tue, 28 Jun 2022 15:50:50 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A1573A715; Tue, 28 Jun 2022 12:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445757; x=1687981757; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TNfBgVKLQyn2LgZtWghMUTiye2sgpvYA6TvdfH/NL10=; b=fquriFWpmeUuj/t+DBAb6GdrZkHwE7Wiq9kNGOGKyngzfCbwckFZoV4w 0WfecC5+zO/EmKCtMUA8ZSrVzCca3OWtfkinhg3EOFHas5XWk7ZFBKg99 xT0WkGJOPn8Qmw1Ab4+BSR0D6ILhrMPxsb/FLWIa+f3heumivQKrJsXIY OTP/RhxsBxDVgIsieZIm6RY4IWQ+v/LXSWlcpTVj4v+ahEqp+3I9gbscf zHK6p2VQm238Gnbfb628I61Dua9PkGCWHCxydbw+Ch3h8wbeNntdVML33 LGsqeXls5OIqmyQtZ7AGPf2rkPlf+eVaIaVwZ7hakL63ZmO1tVqJSUZrI Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869513" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869513" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="587988501" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga007.jf.intel.com with ESMTP; 28 Jun 2022 12:49:11 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9C022013; Tue, 28 Jun 2022 20:49:10 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 12/52] libbpf: add ability to set the BTF/type ID on setting XDP prog Date: Tue, 28 Jun 2022 21:47:32 +0200 Message-Id: <20220628194812.1453059-13-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Covered functions: * bpf_link_create() - via &bpf_link_create_ops; * bpf_link_update() - via &bpf_link_update_ops; * bpf_xdp_attach() - via &bpf_xdp_attach_ops; * bpf_set_link_xdp_fd_opts() - via &bpf_xdp_set_link_opts; bpf_link_update() got the ability to pass arbitrary link type-specific data to the kernel, not just the old and new FDs. No support for bpf_get_link_xdp_info()/&xdp_link_info as we store additional data such as flags and BTF ID in the kernel in BPF link mode only. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf.c | 19 +++++++++++++++++++ tools/lib/bpf/bpf.h | 16 +++++++++++++++- tools/lib/bpf/libbpf.h | 8 ++++++-- tools/lib/bpf/netlink.c | 11 +++++++++++ 4 files changed, 51 insertions(+), 3 deletions(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 240186aac8e6..6036dc75cc7b 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -805,6 +805,11 @@ int bpf_link_create(int prog_fd, int target_fd, if (!OPTS_ZEROED(opts, tracing)) return libbpf_err(-EINVAL); break; + case BPF_XDP: + attr.link_create.xdp.btf_id = OPTS_GET(opts, xdp.btf_id, 0); + if (!OPTS_ZEROED(opts, xdp)) + return libbpf_err(-EINVAL); + break; default: if (!OPTS_ZEROED(opts, flags)) return libbpf_err(-EINVAL); @@ -872,6 +877,20 @@ int bpf_link_update(int link_fd, int new_prog_fd, attr.link_update.flags = OPTS_GET(opts, flags, 0); attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0); + /* As the union in both @attr and @opts is unnamed, just use a pointer + * to any of its members to copy the whole rest of the union/opts + */ + if (opts && opts->sz > offsetof(typeof(*opts), xdp)) { + __u32 attr_left, opts_left; + + attr_left = sizeof(attr.link_update) - + offsetof(typeof(attr.link_update), xdp); + opts_left = opts->sz - offsetof(typeof(*opts), xdp); + + memcpy(&attr.link_update.xdp, &opts->xdp, + min(attr_left, opts_left)); + } + ret = sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); return libbpf_err_errno(ret); } diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index cabc03703e29..4e17995fdaff 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -382,6 +382,10 @@ struct bpf_link_create_opts { struct { __u64 cookie; } tracing; + struct { + /* target metadata BTF + type ID */ + __aligned_u64 btf_id; + } xdp; }; size_t :0; }; @@ -397,8 +401,18 @@ struct bpf_link_update_opts { size_t sz; /* size of this struct for forward/backward compatibility */ __u32 flags; /* extra flags */ __u32 old_prog_fd; /* expected old program FD */ + /* must have the same layout as the same union from + * bpf_attr::link_update, uses direct memcpy() to there + */ + union { + struct { + /* new target metadata BTF + type ID */ + __aligned_u64 new_btf_id; + } xdp; + }; + size_t :0; }; -#define bpf_link_update_opts__last_field old_prog_fd +#define bpf_link_update_opts__last_field xdp.new_btf_id LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, const struct bpf_link_update_opts *opts); diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 4056e9038086..4f77128ba770 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1193,9 +1193,11 @@ struct xdp_link_info { struct bpf_xdp_set_link_opts { size_t sz; int old_fd; + __u32 :32; + __u64 btf_id; size_t :0; }; -#define bpf_xdp_set_link_opts__last_field old_fd +#define bpf_xdp_set_link_opts__last_field btf_id LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_xdp_attach() instead") LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags); @@ -1211,9 +1213,11 @@ LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, struct bpf_xdp_attach_opts { size_t sz; int old_prog_fd; + __u32 :32; + __u64 btf_id; size_t :0; }; -#define bpf_xdp_attach_opts__last_field old_prog_fd +#define bpf_xdp_attach_opts__last_field btf_id struct bpf_xdp_query_opts { size_t sz; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 3a25178d0d12..104a809d5fb2 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -235,6 +235,7 @@ struct __bpf_set_link_xdp_fd_opts { int fd; int old_fd; __u32 flags; + __u64 btf_id; }; static int @@ -269,6 +270,12 @@ __bpf_set_link_xdp_fd_replace(const struct __bpf_set_link_xdp_fd_opts *opts) if (ret < 0) return ret; } + if (opts->btf_id) { + ret = nlattr_add(&req, IFLA_XDP_BTF_ID, &opts->btf_id, + sizeof(opts->btf_id)); + if (ret < 0) + return ret; + } nlattr_end_nested(&req, nla); return libbpf_netlink_send_recv(&req, NULL, NULL, NULL); @@ -292,6 +299,8 @@ int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_a else sl_opts.old_fd = -1; + sl_opts.btf_id = OPTS_GET(opts, btf_id, 0); + err = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(err); } @@ -320,6 +329,8 @@ int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags, flags |= XDP_FLAGS_REPLACE; } + sl_opts.btf_id = OPTS_GET(opts, btf_id, 0); + ret = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(ret); } From patchwork Tue Jun 28 19:47:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898857 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE2B4C43334 for ; Tue, 28 Jun 2022 19:51:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233044AbiF1Tvw (ORCPT ); Tue, 28 Jun 2022 15:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232292AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70AC83A724; Tue, 28 Jun 2022 12:49:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445764; x=1687981764; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yNkO5OonyawD491tq3UN3yp8qYpkGlKlrdxkHHQ0Dbc=; b=gOkPh3YBO8qpcjeMa36Vs3wKtaPejFFtZKnlnhuLA6JniZkhhzZclpAk 3O7TXYUMxm2WrLoLE9U4nfkmOpFrbt2G6jOxDEtuvX+UaOlVyUFi9a0ir ZhJe3VGsA7CalHtM8AiW0im1gmz60/Nu6DYFG80XGjNpfyruri+UYQgJS r7MyTx4ID8ZQRXmkv6lJ1R6Fk4rWX+7iPv3/3t/UulB/wsTlqsEMLgtwT fDkZMzJC3mcfyTDpi/HJ3euu9p9GCS3LM7TIpL2+Tq6yO5z5jalRwKFRM uMWyJb9OtCd+jK/rbhdgIsriaI76yudOV9bm2dP1qAuPrOPY1Ti70UMBo g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="262242844" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="262242844" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="917306820" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga005.fm.intel.com with ESMTP; 28 Jun 2022 12:49:13 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9D022013; Tue, 28 Jun 2022 20:49:11 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 13/52] libbpf: add ability to set the meta threshold on setting XDP prog Date: Tue, 28 Jun 2022 21:47:33 +0200 Message-Id: <20220628194812.1453059-14-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Covered functions: * bpf_link_create() - via &bpf_link_create_ops; * bpf_link_update() - via &bpf_link_update_ops; * bpf_xdp_attach() - via &bpf_xdp_attach_ops; * bpf_set_link_xdp_fd_opts() - via &bpf_xdp_set_link_opts; No support for bpf_get_link_xdp_info()/&xdp_link_info as we store additional data in the kernel in BPF link mode only. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf.c | 3 +++ tools/lib/bpf/bpf.h | 8 +++++++- tools/lib/bpf/libbpf.h | 4 ++-- tools/lib/bpf/netlink.c | 10 ++++++++++ 4 files changed, 22 insertions(+), 3 deletions(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 6036dc75cc7b..e7c713a418f6 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -807,6 +807,9 @@ int bpf_link_create(int prog_fd, int target_fd, break; case BPF_XDP: attr.link_create.xdp.btf_id = OPTS_GET(opts, xdp.btf_id, 0); + attr.link_create.xdp.meta_thresh = OPTS_GET(opts, + xdp.meta_thresh, + 0); if (!OPTS_ZEROED(opts, xdp)) return libbpf_err(-EINVAL); break; diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 4e17995fdaff..c0f54f24d675 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -385,6 +385,8 @@ struct bpf_link_create_opts { struct { /* target metadata BTF + type ID */ __aligned_u64 btf_id; + /* frame size to start composing XDP metadata from */ + __u32 meta_thresh; } xdp; }; size_t :0; @@ -408,11 +410,15 @@ struct bpf_link_update_opts { struct { /* new target metadata BTF + type ID */ __aligned_u64 new_btf_id; + /* new frame size to start composing XDP + * metadata from + */ + __u32 new_meta_thresh; } xdp; }; size_t :0; }; -#define bpf_link_update_opts__last_field xdp.new_btf_id +#define bpf_link_update_opts__last_field xdp.new_meta_thresh LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, const struct bpf_link_update_opts *opts); diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 4f77128ba770..99ac94f148fc 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1193,7 +1193,7 @@ struct xdp_link_info { struct bpf_xdp_set_link_opts { size_t sz; int old_fd; - __u32 :32; + __u32 meta_thresh; __u64 btf_id; size_t :0; }; @@ -1213,7 +1213,7 @@ LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, struct bpf_xdp_attach_opts { size_t sz; int old_prog_fd; - __u32 :32; + __u32 meta_thresh; __u64 btf_id; size_t :0; }; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 104a809d5fb2..ac2a87243ecd 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -236,6 +236,7 @@ struct __bpf_set_link_xdp_fd_opts { int old_fd; __u32 flags; __u64 btf_id; + __u32 meta_thresh; }; static int @@ -276,6 +277,13 @@ __bpf_set_link_xdp_fd_replace(const struct __bpf_set_link_xdp_fd_opts *opts) if (ret < 0) return ret; } + if (opts->meta_thresh) { + ret = nlattr_add(&req, IFLA_XDP_META_THRESH, + &opts->meta_thresh, + sizeof(opts->meta_thresh)); + if (ret < 0) + return ret; + } nlattr_end_nested(&req, nla); return libbpf_netlink_send_recv(&req, NULL, NULL, NULL); @@ -300,6 +308,7 @@ int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_a sl_opts.old_fd = -1; sl_opts.btf_id = OPTS_GET(opts, btf_id, 0); + sl_opts.meta_thresh = OPTS_GET(opts, meta_thresh, 0); err = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(err); @@ -330,6 +339,7 @@ int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags, } sl_opts.btf_id = OPTS_GET(opts, btf_id, 0); + sl_opts.meta_thresh = OPTS_GET(opts, meta_thresh, 0); ret = __bpf_set_link_xdp_fd_replace(&sl_opts); return libbpf_err(ret); From patchwork Tue Jun 28 19:47:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898848 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7979CCA47E for ; Tue, 28 Jun 2022 19:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232864AbiF1Tva (ORCPT ); Tue, 28 Jun 2022 15:51:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230298AbiF1Tuv (ORCPT ); Tue, 28 Jun 2022 15:50:51 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89D963819E; Tue, 28 Jun 2022 12:49:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445759; x=1687981759; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KvcLFlvO+Aq/YsYPMNHuBin8lYDZ9TRXn0mscwW8tkw=; b=V/GfsjM42m5u0YL8eT/47JtQYYGDQRC66vGuDwGm1YaPkTMBNAWg7Zqf HLFUuExKiaV1DwQllpe5oZ34TrnSPeNGUYwCW6j4sKDGFVSmIyEBoYLRs WehCNUsq3ASxJx8ow52d1FNZxPkFYAZ1plDzsUN1SycIbVwXxVv5EkKkC Qft/Rm6l2JB93rgN1aA9kjLIa881RZHv63QEZZIM8BhId8ze+uevxSB8e /GtWCVjYsHb/p7SjO4nyZp+0w75/Efoz+iSp9VZLcDyEbPVz29hnksg0O +5B3DV+jcvvUc/RQt27x96ECRgGCoedzj9etYCSsE9usQo+8K8xLRAmun Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523213" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523213" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="717555046" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga004.jf.intel.com with ESMTP; 28 Jun 2022 12:49:14 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9E022013; Tue, 28 Jun 2022 20:49:12 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 14/52] libbpf: pass &bpf_link_create_opts directly to bpf_program__attach_fd() Date: Tue, 28 Jun 2022 21:47:34 +0200 Message-Id: <20220628194812.1453059-15-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Instead of providing an optional @btf_id which is zero in 3 of 4 cases as an argument, pass a pointer to &bpf_link_create_ops directly instead. This way we can just pass %NULL when no opts are needed and use any of its fields on our wish otherwise. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/libbpf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 9bda111c8167..f4014c09f1cf 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -11958,11 +11958,10 @@ static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_li } static struct bpf_link * -bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id, +bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, + const struct bpf_link_create_opts *opts, const char *target_name) { - DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, - .target_btf_id = btf_id); enum bpf_attach_type attach_type; char errmsg[STRERR_BUFSIZE]; struct bpf_link *link; @@ -11980,7 +11979,7 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id link->detach = &bpf_link__detach_fd; attach_type = bpf_program__expected_attach_type(prog); - link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts); + link_fd = bpf_link_create(prog_fd, target_fd, attach_type, opts); if (link_fd < 0) { link_fd = -errno; free(link); @@ -11996,19 +11995,19 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id struct bpf_link * bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd) { - return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup"); + return bpf_program__attach_fd(prog, cgroup_fd, NULL, "cgroup"); } struct bpf_link * bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd) { - return bpf_program__attach_fd(prog, netns_fd, 0, "netns"); + return bpf_program__attach_fd(prog, netns_fd, NULL, "netns"); } struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex) { /* target_fd/target_ifindex use the same field in LINK_CREATE */ - return bpf_program__attach_fd(prog, ifindex, 0, "xdp"); + return bpf_program__attach_fd(prog, ifindex, NULL, "xdp"); } struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog, @@ -12030,11 +12029,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog, } if (target_fd) { + LIBBPF_OPTS(bpf_link_create_opts, opts); + btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd); if (btf_id < 0) return libbpf_err_ptr(btf_id); - return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace"); + opts.target_btf_id = btf_id; + + return bpf_program__attach_fd(prog, target_fd, &opts, + "freplace"); } else { /* no target, so use raw_tracepoint_open for compatibility * with old kernels From patchwork Tue Jun 28 19:47:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898849 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 155CBCCA479 for ; Tue, 28 Jun 2022 19:51:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232875AbiF1Tvb (ORCPT ); Tue, 28 Jun 2022 15:51:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230151AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB03E3A718; Tue, 28 Jun 2022 12:49:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445760; x=1687981760; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Br7PA2LhyRZLtGRuJC1NdDKYX7AsconXTD56haHkAIc=; b=gMf7pQgWoNgVnc+Lnz/9QCg6nQ1Qj0YebyBl/ga5IDMfduefkTNc4A/b IYEgF47Yl13KGqt5qVfroNTNsQS0L31w61lr5IufzRylvYyvDQr6pIGfs kD1v88+X45LzxpJFfe7D6vfhxakvSVdKkkxzXVpbMR7QCGST/czfOexuZ WBTs/GU9yDRV5N3YD/cdHY/qjSXz+B7DigtQ1Dq4F9GXdkYstX1jsOatx tscKnMFNA1WGzvjNUVpndjMFU+6EE6nKrIbdk3mV1DSJg8zCtvO7HVYet UDApQS1hED7OqgwzJxoG+WmO4yPo0EdU+gxo4g9QLRRDM7/w8leIb7AtR A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282927787" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282927787" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="540598913" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga003.jf.intel.com with ESMTP; 28 Jun 2022 12:49:15 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9F022013; Tue, 28 Jun 2022 20:49:14 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 15/52] libbpf: add bpf_program__attach_xdp_opts() Date: Tue, 28 Jun 2022 21:47:35 +0200 Message-Id: <20220628194812.1453059-16-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a version of bpf_program__attach_xdp() which can take an optional pointer to &bpf_xdp_attach_opts to carry opts from it to bpf_link_create(), primarily to be able to specify a BTF/type ID and a metadata threshold when attaching an XDP program. This struct is originally designed for bpf_xdp_{at,de}tach(), reuse it here as well to not spawn entities (with ::old_prog_fd reused for XDP flags via union). Signed-off-by: Alexander Lobakin --- tools/lib/bpf/libbpf.c | 16 ++++++++++++++++ tools/lib/bpf/libbpf.h | 27 ++++++++++++++++++--------- tools/lib/bpf/libbpf.map | 1 + 3 files changed, 35 insertions(+), 9 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index f4014c09f1cf..b6cc238a2657 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -12010,6 +12010,22 @@ struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifi return bpf_program__attach_fd(prog, ifindex, NULL, "xdp"); } +struct bpf_link * +bpf_program__attach_xdp_opts(const struct bpf_program *prog, int ifindex, + const struct bpf_xdp_attach_opts *opts) +{ + LIBBPF_OPTS(bpf_link_create_opts, lc_opts); + + if (!OPTS_VALID(opts, bpf_xdp_attach_opts)) + return libbpf_err_ptr(-EINVAL); + + lc_opts.flags = OPTS_GET(opts, flags, 0); + lc_opts.xdp.btf_id = OPTS_GET(opts, btf_id, 0); + lc_opts.xdp.meta_thresh = OPTS_GET(opts, meta_thresh, 0); + + return bpf_program__attach_fd(prog, ifindex, &lc_opts, "xdp"); +} + struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog, int target_fd, const char *attach_func_name) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 99ac94f148fc..d6dd05b5b753 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -678,8 +678,26 @@ LIBBPF_API struct bpf_link * bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd); LIBBPF_API struct bpf_link * bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd); + +struct bpf_xdp_attach_opts { + size_t sz; + union { + int old_prog_fd; + /* for bpf_program__attach_xdp_opts() */ + __u32 flags; + }; + __u32 meta_thresh; + __u64 btf_id; + size_t :0; +}; +#define bpf_xdp_attach_opts__last_field btf_id + LIBBPF_API struct bpf_link * bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex); +LIBBPF_API struct bpf_link * +bpf_program__attach_xdp_opts(const struct bpf_program *prog, int ifindex, + const struct bpf_xdp_attach_opts *opts); + LIBBPF_API struct bpf_link * bpf_program__attach_freplace(const struct bpf_program *prog, int target_fd, const char *attach_func_name); @@ -1210,15 +1228,6 @@ LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_xdp_query() instead") LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, size_t info_size, __u32 flags); -struct bpf_xdp_attach_opts { - size_t sz; - int old_prog_fd; - __u32 meta_thresh; - __u64 btf_id; - size_t :0; -}; -#define bpf_xdp_attach_opts__last_field btf_id - struct bpf_xdp_query_opts { size_t sz; __u32 prog_id; /* output */ diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index f0987df15b7a..d14bbf82e37c 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -464,6 +464,7 @@ LIBBPF_1.0.0 { global: btf__add_enum64; btf__add_enum64_value; + bpf_program__attach_xdp_opts; libbpf_bpf_attach_type_str; libbpf_bpf_link_type_str; libbpf_bpf_map_type_str; From patchwork Tue Jun 28 19:47:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898850 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10505CCA47E for ; Tue, 28 Jun 2022 19:51:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232918AbiF1Tvf (ORCPT ); Tue, 28 Jun 2022 15:51:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232223AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 144CB3A71C; Tue, 28 Jun 2022 12:49:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445762; x=1687981762; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V/LelW6Mb5Cc8sNSdQPVZg4VOEAQ0MECgTTL8gSA07g=; b=aJa6bknhYwbF4rMjA/gztvpXrCxKIUFAH4+Tqj9UYtaStH0OiQsCBGhf P9fJB2VcIKAva1Rc5TuWO7KLE54Rp/B5LGJfaRpNsgYai6lUfNiqrzm+9 vcksegu+CK4OiAktBZ5DRQrylc67bJvbW5ul1uYpVN3N7CLT2i3mTUL5Y DaC+OVrGwYSENbix6+3JKbAnGaz9Bs6za65kWWUCBX+PMnNWhdm0btCuq 69ahgG8dmntXw6PEvovMBkdQT5viMOBJItwyjdiYmFQkdOPJl1lMnQNQ9 sq/AEuPOVG9IM25G91T7o3zFZO+GfaJaYRX7aqltaj2CA5OrlV9FYj1FR Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="264874102" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="264874102" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288073" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:17 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9G022013; Tue, 28 Jun 2022 20:49:15 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 16/52] selftests/bpf: expand xdp_link to check that setting meta opts works Date: Tue, 28 Jun 2022 21:47:36 +0200 Message-Id: <20220628194812.1453059-17-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a check in xdp_link to ensure that the values of btf_id and meta_thresh gotten via bpf_obj_get_info_by_fd() is the same as was passed via bpf_link_update(). Plus, that the kernel should fail to set btf_id to 0 when meta_thresh != 0. Also, use the new bpf_program__attach_xdp_opts() instead of the non-optsed version and set the initial metadata threshold value to check whether the kernel is able to process this request. When the threshold is being set via the Netlink interface, it's not being stored anywhere in the kernel core, so no test for it. Signed-off-by: Alexander Lobakin --- .../selftests/bpf/prog_tests/xdp_link.c | 30 +++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_link.c b/tools/testing/selftests/bpf/prog_tests/xdp_link.c index 3e9d5c5521f0..0723278c448f 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_link.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_link.c @@ -10,6 +10,7 @@ void serial_test_xdp_link(void) { struct test_xdp_link *skel1 = NULL, *skel2 = NULL; __u32 id1, id2, id0 = 0, prog_fd1, prog_fd2; + LIBBPF_OPTS(bpf_link_update_opts, lu_opts); LIBBPF_OPTS(bpf_xdp_attach_opts, opts); struct bpf_link_info link_info; struct bpf_prog_info prog_info; @@ -103,8 +104,16 @@ void serial_test_xdp_link(void) bpf_link__destroy(skel1->links.xdp_handler); skel1->links.xdp_handler = NULL; + opts.old_prog_fd = 0; + opts.meta_thresh = 128; + + err = libbpf_get_type_btf_id("struct xdp_meta_generic", &opts.btf_id); + if (!ASSERT_OK(err, "libbpf_get_type_btf_id")) + goto cleanup; + /* new link attach should succeed */ - link = bpf_program__attach_xdp(skel2->progs.xdp_handler, IFINDEX_LO); + link = bpf_program__attach_xdp_opts(skel2->progs.xdp_handler, + IFINDEX_LO, &opts); if (!ASSERT_OK_PTR(link, "link_attach")) goto cleanup; skel2->links.xdp_handler = link; @@ -113,11 +122,25 @@ void serial_test_xdp_link(void) if (!ASSERT_OK(err, "id2_check_err") || !ASSERT_EQ(id0, id2, "id2_check_val")) goto cleanup; + lu_opts.xdp.new_meta_thresh = 256; + lu_opts.xdp.new_btf_id = opts.btf_id; + /* updating program under active BPF link works as expected */ - err = bpf_link__update_program(link, skel1->progs.xdp_handler); + err = bpf_link_update(bpf_link__fd(link), + bpf_program__fd(skel1->progs.xdp_handler), + &lu_opts); if (!ASSERT_OK(err, "link_upd")) goto cleanup; + lu_opts.xdp.new_btf_id = 0; + + /* BTF ID can't be 0 when meta_thresh != 0, and vice versa */ + err = bpf_link_update(bpf_link__fd(link), + bpf_program__fd(skel1->progs.xdp_handler), + &lu_opts); + if (!ASSERT_ERR(err, "link_upd_fail")) + goto cleanup; + memset(&link_info, 0, sizeof(link_info)); err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &link_info, &link_info_len); if (!ASSERT_OK(err, "link_info")) @@ -126,6 +149,9 @@ void serial_test_xdp_link(void) ASSERT_EQ(link_info.type, BPF_LINK_TYPE_XDP, "link_type"); ASSERT_EQ(link_info.prog_id, id1, "link_prog_id"); ASSERT_EQ(link_info.xdp.ifindex, IFINDEX_LO, "link_ifindex"); + ASSERT_EQ(link_info.xdp.btf_id, opts.btf_id, "btf_id"); + ASSERT_EQ(link_info.xdp.meta_thresh, lu_opts.xdp.new_meta_thresh, + "meta_thresh"); /* updating program under active BPF link with different type fails */ err = bpf_link__update_program(link, skel1->progs.tc_handler); From patchwork Tue Jun 28 19:47:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898852 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15174C433EF for ; Tue, 28 Jun 2022 19:51:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231604AbiF1Tvj (ORCPT ); Tue, 28 Jun 2022 15:51:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232288AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56BC43A723; Tue, 28 Jun 2022 12:49:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445763; x=1687981763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KbJE5i/cRb03yyCvDDOc+EAdI2CUGvTU4bgCehvnLDQ=; b=D6udilx3ynGdEGjhEKdwYxnco3wySIGNFDkZ1hqEB0L+I5AGbxksNqy5 GbcVA9/fSW7Nlvd7kEc2UNv4TZmZoucXNVGoXurRuZkf0k7KhCdxfcj2R 9o+IMHjN1RV2VGycfP9mi0IT83ghkIWrwH+yRdhBkYbQuMN9outhApJk/ BIrjr2tUsUOy5q8kaa0e9mxsb2I1XBZ/8ligj0qcsPvIq3qAeJFvRJ9Ps H100sVpDPn4oBVjeD2rTNAf7/XOvZpGDlXfrhkSpFT/X4zdU/cGUXH5nw 5lsyZgYUNeqOPZ5sNNXDC3Q5M8UGAY6jnC9N6jyV14KFk6RLvBJeQQILu A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="345828388" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="345828388" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="565181222" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga006.jf.intel.com with ESMTP; 28 Jun 2022 12:49:18 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9H022013; Tue, 28 Jun 2022 20:49:16 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 17/52] samples/bpf: pass a struct to sample_install_xdp() Date: Tue, 28 Jun 2022 21:47:37 +0200 Message-Id: <20220628194812.1453059-18-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In order to be able to pass more flags and/or other options to sample_install_xdp() from userland programs built on top of this framework, make it consume a const pointer to a structure with all the parameters needed to initialize the sample instead of a set of standalone parameters which doesn't scale. Adjust all the samples accordingly. Signed-off-by: Alexander Lobakin --- samples/bpf/xdp_redirect_cpu_user.c | 24 +++++++++++------------ samples/bpf/xdp_redirect_map_multi_user.c | 19 +++++++++--------- samples/bpf/xdp_redirect_map_user.c | 15 +++++++------- samples/bpf/xdp_redirect_user.c | 15 +++++++------- samples/bpf/xdp_router_ipv4_user.c | 13 ++++++------ samples/bpf/xdp_sample_user.c | 12 +++++++----- samples/bpf/xdp_sample_user.h | 10 ++++++++-- 7 files changed, 58 insertions(+), 50 deletions(-) diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c index a12381c37d2b..15745d8cb5c2 100644 --- a/samples/bpf/xdp_redirect_cpu_user.c +++ b/samples/bpf/xdp_redirect_cpu_user.c @@ -306,6 +306,9 @@ int main(int argc, char **argv) { const char *redir_interface = NULL, *redir_map = NULL; const char *mprog_filename = NULL, *mprog_name = NULL; + struct sample_install_opts opts = { + .ifindex = -1, + }; struct xdp_redirect_cpu *skel; struct bpf_map_info info = {}; struct bpf_cpumap_val value; @@ -315,13 +318,10 @@ int main(int argc, char **argv) bool stress_mode = false; struct bpf_program *prog; const char *prog_name; - bool generic = false; - bool force = false; int added_cpus = 0; bool error = true; int longindex = 0; int add_cpu = -1; - int ifindex = -1; int *cpu, i, opt; __u32 qsize; int n_cpus; @@ -391,10 +391,10 @@ int main(int argc, char **argv) usage(argv, long_options, __doc__, mask, true, skel->obj); goto end_cpu; } - ifindex = if_nametoindex(optarg); - if (!ifindex) - ifindex = strtoul(optarg, NULL, 0); - if (!ifindex) { + opts.ifindex = if_nametoindex(optarg); + if (!opts.ifindex) + opts.ifindex = strtoul(optarg, NULL, 0); + if (!opts.ifindex) { fprintf(stderr, "Bad interface index or name (%d): %s\n", errno, strerror(errno)); usage(argv, long_options, __doc__, mask, true, skel->obj); @@ -408,7 +408,7 @@ int main(int argc, char **argv) interval = strtoul(optarg, NULL, 0); break; case 'S': - generic = true; + opts.generic = true; break; case 'x': stress_mode = true; @@ -456,7 +456,7 @@ int main(int argc, char **argv) qsize = strtoul(optarg, NULL, 0); break; case 'F': - force = true; + opts.force = true; break; case 'v': sample_switch_mode(); @@ -470,7 +470,7 @@ int main(int argc, char **argv) } ret = EXIT_FAIL_OPTION; - if (ifindex == -1) { + if (opts.ifindex == -1) { fprintf(stderr, "Required option --dev missing\n"); usage(argv, long_options, __doc__, mask, true, skel->obj); goto end_cpu; @@ -483,7 +483,7 @@ int main(int argc, char **argv) goto end_cpu; } - skel->rodata->from_match[0] = ifindex; + skel->rodata->from_match[0] = opts.ifindex; if (redir_interface) skel->rodata->to_match[0] = if_nametoindex(redir_interface); @@ -540,7 +540,7 @@ int main(int argc, char **argv) } ret = EXIT_FAIL_XDP; - if (sample_install_xdp(prog, ifindex, generic, force) < 0) + if (sample_install_xdp(prog, &opts) < 0) goto end_cpu; ret = sample_run(interval, stress_mode ? stress_cpumap : NULL, &value); diff --git a/samples/bpf/xdp_redirect_map_multi_user.c b/samples/bpf/xdp_redirect_map_multi_user.c index 9e24f2705b67..85e66f9dc259 100644 --- a/samples/bpf/xdp_redirect_map_multi_user.c +++ b/samples/bpf/xdp_redirect_map_multi_user.c @@ -77,6 +77,7 @@ static int update_mac_map(struct bpf_map *map) int main(int argc, char **argv) { struct bpf_devmap_val devmap_val = {}; + struct sample_install_opts opts = { }; struct xdp_redirect_map_multi *skel; struct bpf_program *ingress_prog; bool xdp_devmap_attached = false; @@ -84,9 +85,6 @@ int main(int argc, char **argv) int ret = EXIT_FAIL_OPTION; unsigned long interval = 2; char ifname[IF_NAMESIZE]; - unsigned int ifindex; - bool generic = false; - bool force = false; bool tried = false; bool error = true; int i, opt; @@ -95,13 +93,13 @@ int main(int argc, char **argv) long_options, NULL)) != -1) { switch (opt) { case 'S': - generic = true; + opts.generic = true; /* devmap_xmit tracepoint not available */ mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; case 'F': - force = true; + opts.force = true; break; case 'X': xdp_devmap_attached = true; @@ -186,13 +184,13 @@ int main(int argc, char **argv) forward_map = skel->maps.forward_map_native; for (i = 0; ifaces[i] > 0; i++) { - ifindex = ifaces[i]; + opts.ifindex = ifaces[i]; ret = EXIT_FAIL_XDP; restart: /* bind prog_fd to each interface */ - if (sample_install_xdp(ingress_prog, ifindex, generic, force) < 0) { - if (generic && !tried) { + if (sample_install_xdp(ingress_prog, &opts) < 0) { + if (opts.generic && !tried) { fprintf(stderr, "Trying fallback to sizeof(int) as value_size for devmap in generic mode\n"); ingress_prog = skel->progs.xdp_redirect_map_general; @@ -206,10 +204,11 @@ int main(int argc, char **argv) /* Add all the interfaces to forward group and attach * egress devmap program if exist */ - devmap_val.ifindex = ifindex; + devmap_val.ifindex = opts.ifindex; if (xdp_devmap_attached) devmap_val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_devmap_prog); - ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0); + ret = bpf_map_update_elem(bpf_map__fd(forward_map), + &opts.ifindex, &devmap_val, 0); if (ret < 0) { fprintf(stderr, "Failed to update devmap value: %s\n", strerror(errno)); diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c index b6e4fc849577..d09ef866e62b 100644 --- a/samples/bpf/xdp_redirect_map_user.c +++ b/samples/bpf/xdp_redirect_map_user.c @@ -43,6 +43,7 @@ static const struct option long_options[] = { int main(int argc, char **argv) { struct bpf_devmap_val devmap_val = {}; + struct sample_install_opts opts = { }; bool xdp_devmap_attached = false; struct xdp_redirect_map *skel; char str[2 * IF_NAMESIZE + 1]; @@ -53,8 +54,6 @@ int main(int argc, char **argv) unsigned long interval = 2; int ret = EXIT_FAIL_OPTION; struct bpf_program *prog; - bool generic = false; - bool force = false; bool tried = false; bool error = true; int opt, key = 0; @@ -63,13 +62,13 @@ int main(int argc, char **argv) long_options, NULL)) != -1) { switch (opt) { case 'S': - generic = true; + opts.generic = true; /* devmap_xmit tracepoint not available */ mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; case 'F': - force = true; + opts.force = true; break; case 'X': xdp_devmap_attached = true; @@ -157,13 +156,14 @@ int main(int argc, char **argv) prog = skel->progs.xdp_redirect_map_native; tx_port_map = skel->maps.tx_port_native; restart: - if (sample_install_xdp(prog, ifindex_in, generic, force) < 0) { + opts.ifindex = ifindex_in; + if (sample_install_xdp(prog, &opts) < 0) { /* First try with struct bpf_devmap_val as value for generic * mode, then fallback to sizeof(int) for older kernels. */ fprintf(stderr, "Trying fallback to sizeof(int) as value_size for devmap in generic mode\n"); - if (generic && !tried) { + if (opts.generic && !tried) { prog = skel->progs.xdp_redirect_map_general; tx_port_map = skel->maps.tx_port_general; tried = true; @@ -174,7 +174,8 @@ int main(int argc, char **argv) } /* Loading dummy XDP prog on out-device */ - sample_install_xdp(skel->progs.xdp_redirect_dummy_prog, ifindex_out, generic, force); + opts.ifindex = ifindex_out; + sample_install_xdp(skel->progs.xdp_redirect_dummy_prog, &opts); devmap_val.ifindex = ifindex_out; if (xdp_devmap_attached) diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c index 8663dd631b6e..2da686a9b8a0 100644 --- a/samples/bpf/xdp_redirect_user.c +++ b/samples/bpf/xdp_redirect_user.c @@ -41,6 +41,7 @@ static const struct option long_options[] = { int main(int argc, char **argv) { + struct sample_install_opts opts = { }; int ifindex_in, ifindex_out, opt; char str[2 * IF_NAMESIZE + 1]; char ifname_out[IF_NAMESIZE]; @@ -48,20 +49,18 @@ int main(int argc, char **argv) int ret = EXIT_FAIL_OPTION; unsigned long interval = 2; struct xdp_redirect *skel; - bool generic = false; - bool force = false; bool error = true; while ((opt = getopt_long(argc, argv, "hSFi:vs", long_options, NULL)) != -1) { switch (opt) { case 'S': - generic = true; + opts.generic = true; mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; case 'F': - force = true; + opts.force = true; break; case 'i': interval = strtoul(optarg, NULL, 0); @@ -132,13 +131,13 @@ int main(int argc, char **argv) } ret = EXIT_FAIL_XDP; - if (sample_install_xdp(skel->progs.xdp_redirect_prog, ifindex_in, - generic, force) < 0) + opts.ifindex = ifindex_in; + if (sample_install_xdp(skel->progs.xdp_redirect_prog, &opts) < 0) goto end_destroy; /* Loading dummy XDP prog on out-device */ - sample_install_xdp(skel->progs.xdp_redirect_dummy_prog, ifindex_out, - generic, force); + opts.ifindex = ifindex_out; + sample_install_xdp(skel->progs.xdp_redirect_dummy_prog, &opts); ret = EXIT_FAIL; if (!if_indextoname(ifindex_in, ifname_in)) { diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c index 294fc15ad1cb..48e9bcb38c8e 100644 --- a/samples/bpf/xdp_router_ipv4_user.c +++ b/samples/bpf/xdp_router_ipv4_user.c @@ -549,13 +549,14 @@ static void usage(char *argv[], const struct option *long_options, int main(int argc, char **argv) { - bool error = true, generic = false, force = false; + struct sample_install_opts opts = { }; int opt, ret = EXIT_FAIL_BPF; struct xdp_router_ipv4 *skel; int i, total_ifindex = argc - 1; char **ifname_list = argv + 1; pthread_t routes_thread; int longindex = 0; + bool error = true; if (libbpf_set_strict_mode(LIBBPF_STRICT_ALL) < 0) { fprintf(stderr, "Failed to set libbpf strict mode: %s\n", @@ -606,12 +607,12 @@ int main(int argc, char **argv) ifname_list += 2; break; case 'S': - generic = true; + opts.generic = true; total_ifindex--; ifname_list++; break; case 'F': - force = true; + opts.force = true; total_ifindex--; ifname_list++; break; @@ -661,15 +662,15 @@ int main(int argc, char **argv) ret = EXIT_FAIL_XDP; for (i = 0; i < total_ifindex; i++) { - int index = if_nametoindex(ifname_list[i]); + opts.ifindex = if_nametoindex(ifname_list[i]); - if (!index) { + if (!opts.ifindex) { fprintf(stderr, "Interface %s not found %s\n", ifname_list[i], strerror(-tx_port_map_fd)); goto end_destroy; } if (sample_install_xdp(skel->progs.xdp_router_ipv4_prog, - index, generic, force) < 0) + &opts) < 0) goto end_destroy; } diff --git a/samples/bpf/xdp_sample_user.c b/samples/bpf/xdp_sample_user.c index 158682852162..8bc23b4c5f19 100644 --- a/samples/bpf/xdp_sample_user.c +++ b/samples/bpf/xdp_sample_user.c @@ -1280,9 +1280,10 @@ static int __sample_remove_xdp(int ifindex, __u32 prog_id, int xdp_flags) return bpf_xdp_detach(ifindex, xdp_flags, NULL); } -int sample_install_xdp(struct bpf_program *xdp_prog, int ifindex, bool generic, - bool force) +int sample_install_xdp(struct bpf_program *xdp_prog, + const struct sample_install_opts *opts) { + __u32 ifindex = opts->ifindex; int ret, xdp_flags = 0; __u32 prog_id = 0; @@ -1292,8 +1293,8 @@ int sample_install_xdp(struct bpf_program *xdp_prog, int ifindex, bool generic, return -ENOTSUP; } - xdp_flags |= !force ? XDP_FLAGS_UPDATE_IF_NOEXIST : 0; - xdp_flags |= generic ? XDP_FLAGS_SKB_MODE : XDP_FLAGS_DRV_MODE; + xdp_flags |= !opts->force ? XDP_FLAGS_UPDATE_IF_NOEXIST : 0; + xdp_flags |= opts->generic ? XDP_FLAGS_SKB_MODE : XDP_FLAGS_DRV_MODE; ret = bpf_xdp_attach(ifindex, bpf_program__fd(xdp_prog), xdp_flags, NULL); if (ret < 0) { ret = -errno; @@ -1301,7 +1302,8 @@ int sample_install_xdp(struct bpf_program *xdp_prog, int ifindex, bool generic, "Failed to install program \"%s\" on ifindex %d, mode = %s, " "force = %s: %s\n", bpf_program__name(xdp_prog), ifindex, - generic ? "skb" : "native", force ? "true" : "false", + opts->generic ? "skb" : "native", + opts->force ? "true" : "false", strerror(-ret)); return ret; } diff --git a/samples/bpf/xdp_sample_user.h b/samples/bpf/xdp_sample_user.h index f45051679977..22afe844ae30 100644 --- a/samples/bpf/xdp_sample_user.h +++ b/samples/bpf/xdp_sample_user.h @@ -30,14 +30,20 @@ enum stats_mask { #define EXIT_FAIL_BPF 4 #define EXIT_FAIL_MEM 5 +struct sample_install_opts { + int ifindex; + __u32 force:1; + __u32 generic:1; +}; + int sample_setup_maps(struct bpf_map **maps); int __sample_init(int mask); void sample_exit(int status); int sample_run(int interval, void (*post_cb)(void *), void *ctx); void sample_switch_mode(void); -int sample_install_xdp(struct bpf_program *xdp_prog, int ifindex, bool generic, - bool force); +int sample_install_xdp(struct bpf_program *xdp_prog, + const struct sample_install_opts *opts); void sample_usage(char *argv[], const struct option *long_options, const char *doc, int mask, bool error); From patchwork Tue Jun 28 19:47:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898855 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 816EDC43334 for ; Tue, 28 Jun 2022 19:51:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231597AbiF1Tvq (ORCPT ); Tue, 28 Jun 2022 15:51:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231552AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABA7A27162; Tue, 28 Jun 2022 12:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445768; x=1687981768; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IwD1xgC8YcxcPkOhlwX1jFE7MQDmCVGifCPPmYbvVTg=; b=NQJzYvFJYsk03tB7pX6ZzA578GfrgNBmKOvqK/CzJicspYBiyugbwWaw yjkcKQXg8be/Q6K+XR2POI4Lu7hDdLWc8BUlgcW/KUhRTO7KED/Hp1vLl RIfsZB8kHfD1/99dcBq9WxYIKbrCRtJ+pqC78lXn6BDhYe6V0Iz4vFFGc Zilzd0tWTjYd6LbZL5PwUoleTonpw85oPr+ibC8HrU1u4XHuE6I/PGmB9 Xo6kWhZiV+jxwAgijTpiHkkL7E258TUdr1jSdWHrHYQoLuek4eTBhB+DF Ew6z0dVQAYXoAduQftkgtUXLZiCG7lr9RXn/xZM7O0yHyxGe7x9kG0C8u w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282927829" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282927829" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809462" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:19 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9I022013; Tue, 28 Jun 2022 20:49:18 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 18/52] samples/bpf: add ability to specify metadata threshold Date: Tue, 28 Jun 2022 21:47:38 +0200 Message-Id: <20220628194812.1453059-19-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC For all of the users of sample_install_xdp() infra (primarily for xdp_redirect_cpu), add the ability to enable/disable/control generic metadata generation using the new UAPI. The format is either just '-M' to enable it unconditionally or '--meta-thresh=' to enable it starting from frames bigger than . Signed-off-by: Alexander Lobakin --- samples/bpf/xdp_redirect_cpu_user.c | 7 +++++- samples/bpf/xdp_redirect_map_multi_user.c | 7 +++++- samples/bpf/xdp_redirect_map_user.c | 7 +++++- samples/bpf/xdp_redirect_user.c | 6 ++++- samples/bpf/xdp_router_ipv4_user.c | 7 +++++- samples/bpf/xdp_sample_user.c | 28 +++++++++++++++++++---- samples/bpf/xdp_sample_user.h | 1 + 7 files changed, 53 insertions(+), 10 deletions(-) diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c index 15745d8cb5c2..ca457c34eb0f 100644 --- a/samples/bpf/xdp_redirect_cpu_user.c +++ b/samples/bpf/xdp_redirect_cpu_user.c @@ -60,6 +60,7 @@ static const struct option long_options[] = { { "mprog-filename", required_argument, NULL, 'f' }, { "redirect-device", required_argument, NULL, 'r' }, { "redirect-map", required_argument, NULL, 'm' }, + { "meta-thresh", optional_argument, NULL, 'M' }, {} }; @@ -382,7 +383,7 @@ int main(int argc, char **argv) } prog = skel->progs.xdp_prognum5_lb_hash_ip_pairs; - while ((opt = getopt_long(argc, argv, "d:si:Sxp:f:e:r:m:c:q:Fvh", + while ((opt = getopt_long(argc, argv, "d:si:Sxp:f:e:r:m:c:q:FMvh", long_options, &longindex)) != -1) { switch (opt) { case 'd': @@ -461,6 +462,10 @@ int main(int argc, char **argv) case 'v': sample_switch_mode(); break; + case 'M': + opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : + 1; + break; case 'h': error = false; default: diff --git a/samples/bpf/xdp_redirect_map_multi_user.c b/samples/bpf/xdp_redirect_map_multi_user.c index 85e66f9dc259..b1c575f3d5f6 100644 --- a/samples/bpf/xdp_redirect_map_multi_user.c +++ b/samples/bpf/xdp_redirect_map_multi_user.c @@ -43,6 +43,7 @@ static const struct option long_options[] = { { "stats", no_argument, NULL, 's' }, { "interval", required_argument, NULL, 'i' }, { "verbose", no_argument, NULL, 'v' }, + { "meta-thresh", optional_argument, NULL, 'M' }, {} }; @@ -89,7 +90,7 @@ int main(int argc, char **argv) bool error = true; int i, opt; - while ((opt = getopt_long(argc, argv, "hSFXi:vs", + while ((opt = getopt_long(argc, argv, "hSFMXi:vs", long_options, NULL)) != -1) { switch (opt) { case 'S': @@ -98,6 +99,10 @@ int main(int argc, char **argv) mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; + case 'M': + opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : + 1; + break; case 'F': opts.force = true; break; diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c index d09ef866e62b..29dd7df804dc 100644 --- a/samples/bpf/xdp_redirect_map_user.c +++ b/samples/bpf/xdp_redirect_map_user.c @@ -37,6 +37,7 @@ static const struct option long_options[] = { { "stats", no_argument, NULL, 's' }, { "interval", required_argument, NULL, 'i' }, { "verbose", no_argument, NULL, 'v' }, + { "meta-thresh", optional_argument, NULL, 'M' }, {} }; @@ -58,7 +59,7 @@ int main(int argc, char **argv) bool error = true; int opt, key = 0; - while ((opt = getopt_long(argc, argv, "hSFXi:vs", + while ((opt = getopt_long(argc, argv, "hSFMXi:vs", long_options, NULL)) != -1) { switch (opt) { case 'S': @@ -67,6 +68,10 @@ int main(int argc, char **argv) mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; + case 'M': + opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : + 1; + break; case 'F': opts.force = true; break; diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c index 2da686a9b8a0..f37c570877ca 100644 --- a/samples/bpf/xdp_redirect_user.c +++ b/samples/bpf/xdp_redirect_user.c @@ -36,6 +36,7 @@ static const struct option long_options[] = { {"stats", no_argument, NULL, 's' }, {"interval", required_argument, NULL, 'i' }, {"verbose", no_argument, NULL, 'v' }, + {"meta-thresh", optional_argument, NULL, 'M' }, {} }; @@ -51,7 +52,7 @@ int main(int argc, char **argv) struct xdp_redirect *skel; bool error = true; - while ((opt = getopt_long(argc, argv, "hSFi:vs", + while ((opt = getopt_long(argc, argv, "hSFMi:vs", long_options, NULL)) != -1) { switch (opt) { case 'S': @@ -59,6 +60,9 @@ int main(int argc, char **argv) mask &= ~(SAMPLE_DEVMAP_XMIT_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI); break; + case 'M': + opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : + 1; case 'F': opts.force = true; break; diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c index 48e9bcb38c8e..5ff12688a31b 100644 --- a/samples/bpf/xdp_router_ipv4_user.c +++ b/samples/bpf/xdp_router_ipv4_user.c @@ -53,6 +53,7 @@ static const struct option long_options[] = { { "interval", required_argument, NULL, 'i' }, { "verbose", no_argument, NULL, 'v' }, { "stats", no_argument, NULL, 's' }, + { "meta-thresh", optional_argument, NULL, 'M' }, {} }; @@ -593,7 +594,7 @@ int main(int argc, char **argv) goto end_destroy; } - while ((opt = getopt_long(argc, argv, "si:SFvh", + while ((opt = getopt_long(argc, argv, "si:SFMvh", long_options, &longindex)) != -1) { switch (opt) { case 's': @@ -621,6 +622,10 @@ int main(int argc, char **argv) total_ifindex--; ifname_list++; break; + case 'M': + opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : + 1; + break; case 'h': error = false; default: diff --git a/samples/bpf/xdp_sample_user.c b/samples/bpf/xdp_sample_user.c index 8bc23b4c5f19..354352541c5e 100644 --- a/samples/bpf/xdp_sample_user.c +++ b/samples/bpf/xdp_sample_user.c @@ -1283,6 +1283,8 @@ static int __sample_remove_xdp(int ifindex, __u32 prog_id, int xdp_flags) int sample_install_xdp(struct bpf_program *xdp_prog, const struct sample_install_opts *opts) { + LIBBPF_OPTS(bpf_xdp_attach_opts, attach_opts, + .meta_thresh = opts->meta_thresh); __u32 ifindex = opts->ifindex; int ret, xdp_flags = 0; __u32 prog_id = 0; @@ -1293,18 +1295,34 @@ int sample_install_xdp(struct bpf_program *xdp_prog, return -ENOTSUP; } + if (attach_opts.meta_thresh) { + ret = libbpf_get_type_btf_id("struct xdp_meta_generic", + &attach_opts.btf_id); + if (ret) { + fprintf(stderr, "Failed to retrieve BTF ID: %s\n", + strerror(-ret)); + return ret; + } + } + xdp_flags |= !opts->force ? XDP_FLAGS_UPDATE_IF_NOEXIST : 0; xdp_flags |= opts->generic ? XDP_FLAGS_SKB_MODE : XDP_FLAGS_DRV_MODE; - ret = bpf_xdp_attach(ifindex, bpf_program__fd(xdp_prog), xdp_flags, NULL); + ret = bpf_xdp_attach(ifindex, bpf_program__fd(xdp_prog), xdp_flags, + &attach_opts); if (ret < 0) { ret = -errno; fprintf(stderr, - "Failed to install program \"%s\" on ifindex %d, mode = %s, " - "force = %s: %s\n", + "Failed to install program \"%s\" on ifindex %d, mode = %s, force = %s, metadata = ", bpf_program__name(xdp_prog), ifindex, opts->generic ? "skb" : "native", - opts->force ? "true" : "false", - strerror(-ret)); + opts->force ? "true" : "false"); + if (attach_opts.meta_thresh) + fprintf(stderr, + "true (from %u bytes, BTF ID is 0x%16llx)", + attach_opts.meta_thresh, attach_opts.btf_id); + else + fprintf(stderr, "false"); + fprintf(stderr, ": %s\n", strerror(-ret)); return ret; } diff --git a/samples/bpf/xdp_sample_user.h b/samples/bpf/xdp_sample_user.h index 22afe844ae30..207953406ee1 100644 --- a/samples/bpf/xdp_sample_user.h +++ b/samples/bpf/xdp_sample_user.h @@ -34,6 +34,7 @@ struct sample_install_opts { int ifindex; __u32 force:1; __u32 generic:1; + __u32 meta_thresh; }; int sample_setup_maps(struct bpf_map **maps); From patchwork Tue Jun 28 19:47:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898853 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08A05CCA479 for ; Tue, 28 Jun 2022 19:51:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232995AbiF1Tvm (ORCPT ); Tue, 28 Jun 2022 15:51:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230249AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 794CA381A6; Tue, 28 Jun 2022 12:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445766; x=1687981766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZpVJYqira/LWPous20gpNOuBnyMMME3HZPBbCSlm2FQ=; b=QxbTB5SYn8q40YWGQiOe/0q7LF8uxTBmvLIKprYBosUotwXOixYIDrDR ogO9nLWYZve8dXYHZz25GGisnZW6j+IBKX8n7WpZv9e6CQ536vwhoKFjn OAWGO9DFr+Mqq1U16yqTGI2G5Sb93zqMFmDfLf7RES7Gzwm9GSq7el6ZA 7iwR64ZKFOSmDnp+AuBIVfRStu/YY8ZOscaUn2Oe4oPmcbk9WkBkwwP18 YlgtiD4Jqwg+dvW62xPlVBEoEqFHXYi0KKNG6ikpPmP1XVNAakcpuMce/ pZUYPc29SWrWcMeLEPjFEiBxkrOBeVM9IXdwRc7a5wCAfVb39nxgu4lTW Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="261635648" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="261635648" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="587988527" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga007.jf.intel.com with ESMTP; 28 Jun 2022 12:49:21 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9J022013; Tue, 28 Jun 2022 20:49:19 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 19/52] stddef: make __struct_group() UAPI C++-friendly Date: Tue, 28 Jun 2022 21:47:39 +0200 Message-Id: <20220628194812.1453059-20-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC For the most part of C++ history, it couldn't have type declarations inside anonymous unions for different reasons. At the same time, __struct_group() relies on the latters, so when the @TAG arguments is not empty, C++ code doesn't want to build: In file included from test_cpp.cpp:5: In file included from tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:18: tools/include/uapi/linux/bpf.h:6774:17: error: types cannot be declared in an anonymous union __struct_group(xdp_meta_generic_rx, rx_full, /* no attrs */, ^ The safest way to fix this without trying to switch standards (which is impossible anyway in UAPI) etc., is to disable tag declaration for that language. This won't break anything since for now it's not buildable at all. Use a separate definition for __struct_group() when __cplusplus is defined to mitigate the error. Also, mirror stddef.h into tools/ so that kernel-shipped userspace code would use the fixed definition instead of _something_ present in the system. Fixes: 50d7bd38c3aa ("stddef: Introduce struct_group() helper macro") Signed-off-by: Alexander Lobakin --- include/uapi/linux/stddef.h | 12 ++++++-- tools/include/uapi/linux/stddef.h | 50 +++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 2 deletions(-) create mode 100644 tools/include/uapi/linux/stddef.h diff --git a/include/uapi/linux/stddef.h b/include/uapi/linux/stddef.h index 7837ba4fe728..67ee9c8aba56 100644 --- a/include/uapi/linux/stddef.h +++ b/include/uapi/linux/stddef.h @@ -20,14 +20,22 @@ * and size: one anonymous and one named. The former's members can be used * normally without sub-struct naming, and the latter can be used to * reason about the start, end, and size of the group of struct members. - * The named struct can also be explicitly tagged for layer reuse, as well - * as both having struct attributes appended. + * The named struct can also be explicitly tagged for layer reuse (C only), + * as well as both having struct attributes appended. */ +#ifndef __cplusplus #define __struct_group(TAG, NAME, ATTRS, MEMBERS...) \ union { \ struct { MEMBERS } ATTRS; \ struct TAG { MEMBERS } ATTRS NAME; \ } +#else +#define __struct_group(__IGNORED, NAME, ATTRS, MEMBERS...) \ + union { \ + struct { MEMBERS } ATTRS; \ + struct { MEMBERS } ATTRS NAME; \ + } +#endif /** * __DECLARE_FLEX_ARRAY() - Declare a flexible array usable in a union diff --git a/tools/include/uapi/linux/stddef.h b/tools/include/uapi/linux/stddef.h new file mode 100644 index 000000000000..40d1c4b21003 --- /dev/null +++ b/tools/include/uapi/linux/stddef.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef __always_inline +#define __always_inline inline +#endif + +/** + * __struct_group() - Create a mirrored named and anonyomous struct + * + * @TAG: The tag name for the named sub-struct (usually empty) + * @NAME: The identifier name of the mirrored sub-struct + * @ATTRS: Any struct attributes (usually empty) + * @MEMBERS: The member declarations for the mirrored structs + * + * Used to create an anonymous union of two structs with identical layout + * and size: one anonymous and one named. The former's members can be used + * normally without sub-struct naming, and the latter can be used to + * reason about the start, end, and size of the group of struct members. + * The named struct can also be explicitly tagged for layer reuse (C only), + * as well as both having struct attributes appended. + */ +#ifndef __cplusplus +#define __struct_group(TAG, NAME, ATTRS, MEMBERS...) \ + union { \ + struct { MEMBERS } ATTRS; \ + struct TAG { MEMBERS } ATTRS NAME; \ + } +#else +#define __struct_group(__IGNORED, NAME, ATTRS, MEMBERS...) \ + union { \ + struct { MEMBERS } ATTRS; \ + struct { MEMBERS } ATTRS NAME; \ + } +#endif + +/** + * __DECLARE_FLEX_ARRAY() - Declare a flexible array usable in a union + * + * @TYPE: The type of each flexible array element + * @NAME: The name of the flexible array member + * + * In order to have a flexible array member in a union or alone in a + * struct, it needs to be wrapped in an anonymous struct with at least 1 + * named member, but that member can be empty. + */ +#define __DECLARE_FLEX_ARRAY(TYPE, NAME) \ + struct { \ + struct { } __empty_ ## NAME; \ + TYPE NAME[]; \ + } From patchwork Tue Jun 28 19:47:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898851 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA016CCA479 for ; Tue, 28 Jun 2022 19:51:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231398AbiF1Tvh (ORCPT ); Tue, 28 Jun 2022 15:51:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230178AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 865E425C6E; Tue, 28 Jun 2022 12:49:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445767; x=1687981767; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=h7GfN6EnBGflCzZCdEPR/mE/rqKzDdKMQ5giru4caVw=; b=CMV1qtcdyI8q50+aCMlNgjDlrcgJC1a+viCx5Q1HGbvgOUC3Va6Poxt5 99iug9ITq0Flsqc1cWxZ5IIfolMidWTp4WQldokI2V0mA6bS1YtZwiPfo nDNxhJqhF2HxODfKRReoa59uEXKba7lw941oJyu7NQwCzeN/aYK38bG0t E0nSl9ZwnRHkALPi6DCg4FGKtWapwsz5elAp7s7wkpwLTZO0URN1AjHa7 StU27kgTkGEE0zqckf4C1Rp8Y/1nClKEzLVQoL3KYAbnUCxN56cj8rhDR /EpZi9u++B9hzzPHSEJFOLXQkrPmls+wbSwPGjyIP7Px424yLru0YRRmp g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869581" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869581" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="623054112" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga001.jf.intel.com with ESMTP; 28 Jun 2022 12:49:22 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9K022013; Tue, 28 Jun 2022 20:49:20 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 20/52] net, xdp: move XDP metadata helpers into new xdp_meta.h Date: Tue, 28 Jun 2022 21:47:40 +0200 Message-Id: <20220628194812.1453059-21-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC gets included indirectly into tons of different files across the kernel. To not make them dependent on the header files needed for the XDP metadata definitions, which will be used only by several driver and XDP core files, and have the metadata code logically separated, create a new header file, , and move several already existing metadata helpers to it. Signed-off-by: Alexander Lobakin --- MAINTAINERS | 1 + .../ethernet/mellanox/mlx5/core/en/xsk/rx.c | 1 + drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 1 + drivers/net/tun.c | 2 +- include/net/xdp.h | 20 ------------- include/net/xdp_meta.h | 29 +++++++++++++++++++ net/bpf/core.c | 2 +- net/bpf/prog_ops.c | 1 + net/bpf/test_run.c | 2 +- net/xdp/xsk.c | 2 +- 10 files changed, 37 insertions(+), 24 deletions(-) create mode 100644 include/net/xdp_meta.h diff --git a/MAINTAINERS b/MAINTAINERS index 91190e12a157..24a640c8a306 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21722,6 +21722,7 @@ L: netdev@vger.kernel.org L: bpf@vger.kernel.org S: Supported F: include/net/xdp.h +F: include/net/xdp_meta.h F: include/net/xdp_priv.h F: include/trace/events/xdp.h F: kernel/bpf/cpumap.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c index 9a1553598a7c..c1fc5c79d90f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c @@ -3,6 +3,7 @@ #include "rx.h" #include "en/xdp.h" +#include #include #include diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c index 454fea4c8be2..0957e866799b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c @@ -4,6 +4,7 @@ #include #include +#include #include "../nfp_app.h" #include "../nfp_net.h" diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 87a635aac008..0eb0cc6966e4 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -61,7 +61,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/include/net/xdp.h b/include/net/xdp.h index 7b8ba068d28a..1663d0b3a05a 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -378,26 +378,6 @@ int xdp_reg_mem_model(struct xdp_mem_info *mem, enum xdp_mem_type type, void *allocator); void xdp_unreg_mem_model(struct xdp_mem_info *mem); -/* Drivers not supporting XDP metadata can use this helper, which - * rejects any room expansion for metadata as a result. - */ -static __always_inline void -xdp_set_data_meta_invalid(struct xdp_buff *xdp) -{ - xdp->data_meta = xdp->data + 1; -} - -static __always_inline bool -xdp_data_meta_unsupported(const struct xdp_buff *xdp) -{ - return unlikely(xdp->data_meta > xdp->data); -} - -static inline bool xdp_metalen_invalid(unsigned long metalen) -{ - return (metalen & (sizeof(__u32) - 1)) || (metalen > 32); -} - struct xdp_attachment_info { struct bpf_prog *prog; u64 btf_id; diff --git a/include/net/xdp_meta.h b/include/net/xdp_meta.h new file mode 100644 index 000000000000..e1f3df9ceb93 --- /dev/null +++ b/include/net/xdp_meta.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2022, Intel Corporation. */ + +#ifndef __LINUX_NET_XDP_META_H__ +#define __LINUX_NET_XDP_META_H__ + +#include + +/* Drivers not supporting XDP metadata can use this helper, which + * rejects any room expansion for metadata as a result. + */ +static __always_inline void +xdp_set_data_meta_invalid(struct xdp_buff *xdp) +{ + xdp->data_meta = xdp->data + 1; +} + +static __always_inline bool +xdp_data_meta_unsupported(const struct xdp_buff *xdp) +{ + return unlikely(xdp->data_meta > xdp->data); +} + +static inline bool xdp_metalen_invalid(unsigned long metalen) +{ + return (metalen & (sizeof(__u32) - 1)) || (metalen > 32); +} + +#endif /* __LINUX_NET_XDP_META_H__ */ diff --git a/net/bpf/core.c b/net/bpf/core.c index dcd3b6ae86b7..18174d6d8687 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -14,7 +14,7 @@ #include #include -#include +#include #include /* struct xdp_mem_allocator */ #include #include diff --git a/net/bpf/prog_ops.c b/net/bpf/prog_ops.c index 33f02842e715..bf174b8d8a36 100644 --- a/net/bpf/prog_ops.c +++ b/net/bpf/prog_ops.c @@ -2,6 +2,7 @@ #include #include +#include #include #include diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 2ca96acbc50a..596b523ccced 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #define CREATE_TRACE_POINTS #include diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 19ac872a6624..ebf6a67424cd 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include "xsk_queue.h" #include "xdp_umem.h" From patchwork Tue Jun 28 19:47:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898854 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11380C43334 for ; Tue, 28 Jun 2022 19:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233000AbiF1Tvo (ORCPT ); Tue, 28 Jun 2022 15:51:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231670AbiF1Tux (ORCPT ); Tue, 28 Jun 2022 15:50:53 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0C80381B2; Tue, 28 Jun 2022 12:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445768; x=1687981768; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZoktIBH2DcbAQq3Yb1c6s52MpIMxZuvptyzwO9VbH6M=; b=KDWxERp5hZjnzyL92jLmpa3tHJ/Mbdxukb39r9VL6V4jyQwgzzQMMghj 6yjEMKa/twYYIiE/DLNY+AwHM9eJvOgHykwvmb9zu68rPc5p/n4kNBZtx EAE2ze6DML/eNjnh5jtcPsjS0yXLKelHhT3z6Bk5fjM0ztpALcJsF2Bh6 qR6X93EHtNYA7y+Qqken7ALmII3/VUh/RfjXOd+DbeEVdFKE7uQbdGASw Ql9GWQ98EvK65v/u5O906ftTlHYe3DKsVvhJ8JYxW6cwx8+iEJR0KxwIu Ty+0NW9oFdnsWBIOJfUC0yFxXplpbtJzq8tFqDHA+QgwV/tsB1Zyz2LvY Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869586" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869586" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288101" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:23 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9L022013; Tue, 28 Jun 2022 20:49:22 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 21/52] net, xdp: allow metadata > 32 Date: Tue, 28 Jun 2022 21:47:41 +0200 Message-Id: <20220628194812.1453059-22-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Hardware/driver-prepended XDP metadata might be much bigger than 32 bytes, especially if it includes a piece of a descriptor. Relax the restriction and allow metadata larger than 32 bytes and make __skb_metadata_differs() work with bigger lengths. The new restriction is pretty much mechanical -- skb_shared_info::meta_len is a u8 and XDP_PACKET_HEADROOM is 256 (minus `sizeof(struct xdp_frame)`). The requirement of having its length aligned to 4 bytes is still valid. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 13 ++++++++----- include/net/xdp_meta.h | 21 ++++++++++++++++++++- 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 82edf0359ab3..a825ea7f375d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4096,10 +4096,13 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a, { const void *a = skb_metadata_end(skb_a); const void *b = skb_metadata_end(skb_b); - /* Using more efficient varaiant than plain call to memcmp(). */ -#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 u64 diffs = 0; + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || + BITS_PER_LONG != 64) + goto slow; + + /* Using more efficient variant than plain call to memcmp(). */ switch (meta_len) { #define __it(x, op) (x -= sizeof(u##op)) #define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op)) @@ -4119,11 +4122,11 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a, fallthrough; case 4: diffs |= __it_diff(a, b, 32); break; + default: +slow: + return memcmp(a - meta_len, b - meta_len, meta_len); } return diffs; -#else - return memcmp(a - meta_len, b - meta_len, meta_len); -#endif } static inline bool skb_metadata_differs(const struct sk_buff *skb_a, diff --git a/include/net/xdp_meta.h b/include/net/xdp_meta.h index e1f3df9ceb93..3a40189d71c6 100644 --- a/include/net/xdp_meta.h +++ b/include/net/xdp_meta.h @@ -5,6 +5,7 @@ #define __LINUX_NET_XDP_META_H__ #include +#include /* Drivers not supporting XDP metadata can use this helper, which * rejects any room expansion for metadata as a result. @@ -21,9 +22,27 @@ xdp_data_meta_unsupported(const struct xdp_buff *xdp) return unlikely(xdp->data_meta > xdp->data); } +/** + * xdp_metalen_invalid -- check if the length of a frame's metadata is valid + * @metalen: the length of the frame's metadata + * + * skb_shared_info::meta_len is of 1 byte long, thus it can't be longer than + * 255, but this always can change. XDP_PACKET_HEADROOM is 256, and this is a + * UAPI. sizeof(struct xdp_frame) is reserved since xdp_frame is being placed + * at xdp_buff::data_hard_start whilst being constructed on XDP_REDIRECT. + * The 32-bit alignment requirement is arbitrary, kept for simplicity and, + * sometimes, speed. + */ static inline bool xdp_metalen_invalid(unsigned long metalen) { - return (metalen & (sizeof(__u32) - 1)) || (metalen > 32); + typeof(metalen) max; + + max = min_t(typeof(max), + (typeof_member(struct skb_shared_info, meta_len))~0UL, + XDP_PACKET_HEADROOM - sizeof(struct xdp_frame)); + BUILD_BUG_ON(!__builtin_constant_p(max)); + + return (metalen & (sizeof(u32) - 1)) || metalen > max; } #endif /* __LINUX_NET_XDP_META_H__ */ From patchwork Tue Jun 28 19:47:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898856 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 604CCC43334 for ; Tue, 28 Jun 2022 19:51:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233040AbiF1Tvt (ORCPT ); Tue, 28 Jun 2022 15:51:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230089AbiF1Tuw (ORCPT ); Tue, 28 Jun 2022 15:50:52 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93FFF62FA; Tue, 28 Jun 2022 12:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445769; x=1687981769; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SutG8vtwjgxHponFr0JSgxWBk/og3Hhe0diuNsWKHbE=; b=MTibYOcQnodnkJWUOqTgp6T/m8KOaAMZuCjoqwkBhxDSAFVJeuGWDzA0 gluNCtMqofVjmVOy5+RApv0SdK1VgM/ebRjUcu3Fhd7U011/mCYSnlcTX w11c8ieCNDmU/dirMoiGk99Z7MizVZxMEudaEhjbH8niNuSRfxyuoj/Sq GPj4rTkSIzlEktthH8uuMAmNoy7omx0/Bn+kWUwszpe9EhMYnpthfUieE mZm1pNzIhT/nbo/bjJigZtA0FDKH1dya4lQB9eJH4Ci3HIsIllDTc93YO 72mPI5jR/ZRiC00Z775Zvm0WxiJhdD5c0UDJTg7nH/Q0cu65A3CatPToJ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="368146910" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="368146910" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="732883398" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga001.fm.intel.com with ESMTP; 28 Jun 2022 12:49:25 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9M022013; Tue, 28 Jun 2022 20:49:23 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 22/52] net, skbuff: add ability to skip skb metadata comparison Date: Tue, 28 Jun 2022 21:47:42 +0200 Message-Id: <20220628194812.1453059-23-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Some XDP metadata fields maybe be unique from frame to frame, not necessarily indicating that it's from a different flow. This includes frame checksums, timestamps etc. The drivers usually carry the metadata to skbs along with the payload, and the GRO layer tries to compare the metadata of the frames. This not only leads to perf regressions (esp. given that metadata can now be larger than 32 bytes -> a slower call to memmp() will be used), but also breaks frame coalescing at all. To avoid that, add an skb flag indicating that the metadata can carry unique values and thus should not be compared. If at least one of the skbs passed to skb_metadata_differs() carries it, the function will then immediately return reporting that they're identical. The underscored version of the function is not affected, allowing to explicitly compare the meta if needed. The flag is being cleared on pskb_expand_head() when the skb_shared_info::meta_len gets zeroed. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 18 ++++++++++++++++++ net/core/skbuff.c | 1 + 2 files changed, 19 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a825ea7f375d..1c308511acbb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -509,6 +509,11 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + /* skb metadata may contain unique values such as checksums + * and we should not compare it against others. + */ + SKBFL_METADATA_NOCOMP = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) @@ -4137,6 +4142,9 @@ static inline bool skb_metadata_differs(const struct sk_buff *skb_a, if (!(len_a | len_b)) return false; + if ((skb_shinfo(skb_a)->flags | skb_shinfo(skb_b)->flags) & + SKBFL_METADATA_NOCOMP) + return false; return len_a != len_b ? true : __skb_metadata_differs(skb_a, skb_b, len_a); @@ -4152,6 +4160,16 @@ static inline void skb_metadata_clear(struct sk_buff *skb) skb_metadata_set(skb, 0); } +static inline void skb_metadata_nocomp_set(struct sk_buff *skb) +{ + skb_shinfo(skb)->flags |= SKBFL_METADATA_NOCOMP; +} + +static inline void skb_metadata_nocomp_clear(struct sk_buff *skb) +{ + skb_shinfo(skb)->flags &= ~SKBFL_METADATA_NOCOMP; +} + struct sk_buff *skb_clone_sk(struct sk_buff *skb); #ifdef CONFIG_NETWORK_PHY_TIMESTAMPING diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 00bf35ee8205..5b23fc7f1157 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1750,6 +1750,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, atomic_set(&skb_shinfo(skb)->dataref, 1); skb_metadata_clear(skb); + skb_metadata_nocomp_clear(skb); /* It is not generally safe to change skb->truesize. * For the moment, we really care of rx path, or From patchwork Tue Jun 28 19:47:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898886 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF317CCA479 for ; Tue, 28 Jun 2022 19:54:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233755AbiF1Tyk (ORCPT ); Tue, 28 Jun 2022 15:54:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232417AbiF1Tu4 (ORCPT ); Tue, 28 Jun 2022 15:50:56 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58F5BB7F8; Tue, 28 Jun 2022 12:49:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445788; x=1687981788; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rckiSPr5+wxzNkTahHMP+SEYFlLnWpiNQnrV9iaYFnc=; b=lpXUJAnR4zJFDkwxGVUdkIWl5zcBMBVkQjxXEJprOvwskiwgvW3bJS04 IImA/5Qsa3vjofiWXzTW6IKuHyXGKC/MAhyde+lWWnDyIZ9rFKYQHV7Jj uR9dZjxcTxf1oC8fqGwawDVuqafWN4OlRrE1e7sEE/HQamZEimRKh7KBx qUrbyJmBsEc+/4jNftMav/Fu10nwNp/L2dZWp/Iaxdl0N4t+f+aq8EJ1e 9mBrgdQA48LUcTtS6bNIFIAIr8MwzOaTFMVQLvxSSGMPNzUfwmGLCEOa0 60ouWlvUOTJkmirwIHOL9JAU3kHxZwQrO4/jNkaJlv+bg9nhCaVxgLunI Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869598" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869598" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288113" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:26 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9N022013; Tue, 28 Jun 2022 20:49:24 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 23/52] net, skbuff: constify the @skb argument of skb_hwtstamps() Date: Tue, 28 Jun 2022 21:47:43 +0200 Message-Id: <20220628194812.1453059-24-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC The @skb argument only dereferences the &skb_shared_info pointer, so it doesn't need a writable pointer. Constify it to be able to pass const pointers to the code which uses this function and give the compilers a little more room for optimization. As an example, constify the @skb argument of tpacket_get_timestamp() and __packet_set_timestamp() of the AF_PACKET core code. There are lot more places in the kernel where the similar micro-opts can be done in the future. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 3 ++- net/packet/af_packet.c | 8 ++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1c308511acbb..0a95f753c1d9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1617,7 +1617,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) -static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) +static inline struct skb_shared_hwtstamps * +skb_hwtstamps(const struct sk_buff *skb) { return &skb_shinfo(skb)->hwtstamps; } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d08c4728523b..20eac049e69e 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -449,10 +449,10 @@ static int __packet_get_status(const struct packet_sock *po, void *frame) } } -static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts, - unsigned int flags) +static __u32 tpacket_get_timestamp(const struct sk_buff *skb, + struct timespec64 *ts, unsigned int flags) { - struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); + const struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); if (shhwtstamps && (flags & SOF_TIMESTAMPING_RAW_HARDWARE) && @@ -467,7 +467,7 @@ static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts, } static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, - struct sk_buff *skb) + const struct sk_buff *skb) { union tpacket_uhdr h; struct timespec64 ts; From patchwork Tue Jun 28 19:47:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898861 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30834C43334 for ; Tue, 28 Jun 2022 19:53:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229628AbiF1TxQ (ORCPT ); Tue, 28 Jun 2022 15:53:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232457AbiF1Tu5 (ORCPT ); Tue, 28 Jun 2022 15:50:57 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5CAD1FCD4; Tue, 28 Jun 2022 12:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445790; x=1687981790; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jaSXqFe1wqX1hqacI7CTj2oedil7nvHz4K/7STA+keI=; b=N0B9qgAboZogUAyD+8k87qAz7sQb5bvmMGkpwDW0iBJ4g8NBwUXZrMcN v89RTJ6zO0BRpcvQvPHbZW1OcYs10X0I++wJ68NdHaY3OUA7rY8k3PVR5 oB9BI6iZROpFjqIu5bziauhVmqy8LwRKadfiRoWLrukqGk8E/HQR0guR7 X50uFsKYUcxgJKfduxXgunJfUB4Haab2JD5oH1+eCfuR9Cai1vrl9yukf WmgaauBj036acS/qYGvp6BPqWlcOemwY8r0GF109k3DLNIY/LOgT1LUi3 MoqwFWbcBY5eaHFmUxK2MMr0syH7pFIYSHSEVVFCguEsOzcsrRlJ3psnL g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869605" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869605" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288117" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:27 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9O022013; Tue, 28 Jun 2022 20:49:26 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 24/52] bpf, xdp: declare generic XDP metadata structure Date: Tue, 28 Jun 2022 21:47:44 +0200 Message-Id: <20220628194812.1453059-25-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Michal Swiatkowski The generic XDP metadata is a driver-independent "header" which carries the essential info such as the checksum status, the hash etc. It can be composed by both hardware and software (drivers) and is designed to pass that info, usually taken from the NIC descriptors, between the different subsystems and layers in one unified format. As it's "cross-everything" and can be composed by hardware (primarily SmartNICs), an explicit Endianness is required. Most hardware and hosts operate in LE nowadays, so the choice was obvious although network frames themselves are in BE. The byteswap macros will be no-ops for LE systems. The first and the last field must always be 2-byte one to have a natural alignment of 4 and 8 byte members on 32-bit platforms where there's an "IP align" 2-byte padding in front of the data: the first member paired with that padding makes the next one aligned to 4 bytes, the last one stacks with the Ethernet header to make its end aligned to 4 bytes. As it's being prepended right in front of the Ethernet header, it grows to the left, so all new fields must be added at the beginning of the structure in the future. The related definitions are declared inside an enum so that they're visible to BPF programs. The struct is declared in UAPI so AF_XDP programs, which can work with metadata as well, would have access to it. Signed-off-by: Michal Swiatkowski Co-developed-by: Larysa Zaremba Signed-off-by: Larysa Zaremba Co-developed-by: Alexander Lobakin Signed-off-by: Alexander Lobakin --- include/uapi/linux/bpf.h | 173 +++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 173 +++++++++++++++++++++++++++++++++ 2 files changed, 346 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 372170ded1d8..1caaec1de625 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -8,6 +8,7 @@ #ifndef _UAPI__LINUX_BPF_H__ #define _UAPI__LINUX_BPF_H__ +#include #include #include @@ -6859,4 +6860,176 @@ struct bpf_core_relo { enum bpf_core_relo_kind kind; }; +/* Definitions being used to work with &xdp_meta_generic, declared as an enum + * so they are visible for BPF programs via vmlinux.h. + */ +enum xdp_meta_generic_defs { + /* xdp_meta_generic::tx_flags */ + + /* Mask of bits containing Tx timestamp action */ + XDP_META_TX_TSTAMP_TYPE = (0x3 << 4), + /* No action is needed */ + XDP_META_TX_TSTAMP_ACT = 0x0, + /* %SO_TIMESTAMP command */ + XDP_META_TX_TSTAMP_SOCK = 0x1, + /* Set the value to the actual time when a packet is sent */ + XDP_META_TX_TSTAMP_COMP = 0x2, + /* Mask of bits containing Tx VLAN action */ + XDP_META_TX_VLAN_TYPE = (0x3 << 2), + /* No action is needed */ + XDP_META_TX_VLAN_NONE = 0x0, + /* NIC must push C-VLAN tag */ + XDP_META_TX_CVID = 0x1, + /* NIC must push S-VLAN tag */ + XDP_META_TX_SVID = 0x2, + /* Mask of bits containing Tx checksum action */ + XDP_META_TX_CSUM_ACT = (0x3 << 0), + /* No action for checksum */ + XDP_META_TX_CSUM_ASIS = 0x0, + /* NIC must compute checksum, no start/offset are provided */ + XDP_META_TX_CSUM_AUTO = 0x1, + /* NIC must compute checksum using the provided start and offset */ + XDP_META_TX_CSUM_HELP = 0x2, + + /* xdp_meta_generic::rx_flags */ + + /* Metadata contains valid Rx queue ID */ + XDP_META_RX_QID_PRESENT = (0x1 << 9), + /* Metadata contains valid Rx timestamp */ + XDP_META_RX_TSTAMP_PRESENT = (0x1 << 8), + /* Mask of bits containing Rx VLAN status */ + XDP_META_RX_VLAN_TYPE = (0x3 << 6), + /* Metadata does not have any VLAN tags */ + XDP_META_RX_VLAN_NONE = 0x0, + /* Metadata carries valid C-VLAN tag */ + XDP_META_RX_CVID = 0x1, + /* Metadata carries valid S-VLAN tag */ + XDP_META_RX_SVID = 0x2, + /* Mask of bits containing Rx hash status */ + XDP_META_RX_HASH_TYPE = (0x3 << 4), + /* Metadata has no RSS hash */ + XDP_META_RX_HASH_NONE = 0x0, + /* Metadata has valid L2 hash */ + XDP_META_RX_HASH_L2 = 0x1, + /* Metadata has valid L3 hash */ + XDP_META_RX_HASH_L3 = 0x2, + /* Metadata has valid L4 hash */ + XDP_META_RX_HASH_L4 = 0x3, + /* Mask of the field containing checksum level (if there's encap) */ + XDP_META_RX_CSUM_LEVEL = (0x3 << 2), + /* Mask of bits containing Rx checksum status */ + XDP_META_RX_CSUM_STATUS = (0x3 << 0), + /* Metadata has no checksum info */ + XDP_META_RX_CSUM_NONE = 0x0, + /* Checksum has been verified by NIC */ + XDP_META_RX_CSUM_OK = 0x1, + /* Metadata carries valid checksum */ + XDP_META_RX_CSUM_COMP = 0x2, + + /* xdp_meta_generic::magic_id indicates that the metadata is either + * struct xdp_meta_generic itself or contains it at the end -> can be + * used to get/set HW hints. + * Direct btf_id comparison is not enough here as a custom structure + * caring xdp_meta_generic at the end will have a different ID. + */ + XDP_META_GENERIC_MAGIC = 0xeda6, +}; + +/* Generic metadata can be composed directly by HW, plus it should always + * have the first field as __le16 to account the 2 bytes of "IP align", so + * we pack it to avoid unexpected paddings. Also, it should be aligned to + * sizeof(__be16) as any other Ethernet data, and to optimize access on the + * 32-bit platforms. + */ +#define __xdp_meta_generic_attrs \ + __attribute__((__packed__)) \ + __attribute__((aligned(sizeof(__be16)))) + +/* Depending on the field layout inside the structure, it might or might not + * emit a "packed attribute is unnecessary" warning (when enabled, e.g. in + * libbpf). To not add and remove the attributes on each field addition, + * just suppress it. + */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpacked" + +/* All fields have explicit endianness, as it might be composed by HW. + * Byteswaps are needed for the Big Endian architectures to access the + * fields. + */ +struct xdp_meta_generic { + /* Add new fields here */ + + /* Egress part */ + __struct_group(/* no tag */, tx, __xdp_meta_generic_attrs, + /* Offset from the start of the frame to the L4 header + * to compute checksum for + */ + __le16 tx_csum_start; + /* Offset inside the L4 header to the checksum field */ + __le16 tx_csum_off; + /* ID for hardware VLAN push */ + __le16 tx_vid; + /* Flags indicating which Tx metadata is used */ + __le32 tx_flags; + /* Tx timestamp value */ + __le64 tx_tstamp; + ); + + /* Shortcut for the half relevant on ingress: Rx + IDs */ + __struct_group(xdp_meta_generic_rx, rx_full, __xdp_meta_generic_attrs, + /* Ingress part */ + __struct_group(/* no tag */, rx, __xdp_meta_generic_attrs, + /* Rx timestamp value */ + __le64 rx_tstamp; + /* Rx hash value */ + __le32 rx_hash; + /* Rx checksum value */ + __le32 rx_csum; + /* VLAN ID popped on Rx */ + __le16 rx_vid; + /* Rx queue ID on which the frame has arrived */ + __le16 rx_qid; + /* Flags indicating which Rx metadata is used */ + __le32 rx_flags; + ); + + /* Unique metadata identifiers */ + __struct_group(/* no tag */, id, __xdp_meta_generic_attrs, + union { + struct { +#ifdef __BIG_ENDIAN_BITFIELD + /* Indicates the ID of the BTF which + * the below type ID comes from, as + * several kernel modules may have + * identical type IDs + */ + __le32 btf_id; + /* Indicates the ID of the actual + * structure passed as metadata, + * within the above BTF ID + */ + __le32 type_id; +#else /* __LITTLE_ENDIAN_BITFIELD */ + __le32 type_id; + __le32 btf_id; +#endif /* __LITTLE_ENDIAN_BITFIELD */ + }; + /* BPF program gets IDs coded as one __u64: + * `btf_id << 32 | type_id`, allow direct + * comparison + */ + __le64 full_id; + }; + /* If set to the correct value, indicates that the + * meta is generic-compatible and can be used by + * the consumers of generic metadata + */ + __le16 magic_id; + ); + ); +} __xdp_meta_generic_attrs; + +#pragma GCC diagnostic pop + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 372170ded1d8..436b925adfb3 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -8,6 +8,7 @@ #ifndef _UAPI__LINUX_BPF_H__ #define _UAPI__LINUX_BPF_H__ +#include #include #include @@ -6859,4 +6860,176 @@ struct bpf_core_relo { enum bpf_core_relo_kind kind; }; +/* Definitions being used to work with &xdp_meta_generic, declared as an enum + * so they are visible for BPF programs via vmlinux.h. + */ +enum xdp_meta_generic_defs { + /* xdp_meta_generic::tx_flags */ + + /* Mask of bits containing Tx timestamp action */ + XDP_META_TX_TSTAMP_ACT = (0x3 << 4), + /* No action is needed */ + XDP_META_TX_TSTAMP_NONE = 0x0, + /* %SO_TIMESTAMP command */ + XDP_META_TX_TSTAMP_SOCK = 0x1, + /* Set the value to the actual time when a packet is sent */ + XDP_META_TX_TSTAMP_COMP = 0x2, + /* Mask of bits containing Tx VLAN action */ + XDP_META_TX_VLAN_TYPE = (0x3 << 2), + /* No action is needed */ + XDP_META_TX_VLAN_NONE = 0x0, + /* NIC must push C-VLAN tag */ + XDP_META_TX_CVID = 0x1, + /* NIC must push S-VLAN tag */ + XDP_META_TX_SVID = 0x2, + /* Mask of bits containing Tx checksum action */ + XDP_META_TX_CSUM_ACT = (0x3 << 0), + /* No action for checksum */ + XDP_META_TX_CSUM_ASIS = 0x0, + /* NIC must compute checksum, no start/offset are provided */ + XDP_META_TX_CSUM_AUTO = 0x1, + /* NIC must compute checksum using the provided start and offset */ + XDP_META_TX_CSUM_HELP = 0x2, + + /* xdp_meta_generic::rx_flags */ + + /* Metadata contains valid Rx queue ID */ + XDP_META_RX_QID_PRESENT = (0x1 << 9), + /* Metadata contains valid Rx timestamp */ + XDP_META_RX_TSTAMP_PRESENT = (0x1 << 8), + /* Mask of bits containing Rx VLAN status */ + XDP_META_RX_VLAN_TYPE = (0x3 << 6), + /* Metadata does not have any VLAN tags */ + XDP_META_RX_VLAN_NONE = 0x0, + /* Metadata carries valid C-VLAN tag */ + XDP_META_RX_CVID = 0x1, + /* Metadata carries valid S-VLAN tag */ + XDP_META_RX_SVID = 0x2, + /* Mask of bits containing Rx hash status */ + XDP_META_RX_HASH_TYPE = (0x3 << 4), + /* Metadata has no RSS hash */ + XDP_META_RX_HASH_NONE = 0x0, + /* Metadata has valid L2 hash */ + XDP_META_RX_HASH_L2 = 0x1, + /* Metadata has valid L3 hash */ + XDP_META_RX_HASH_L3 = 0x2, + /* Metadata has valid L4 hash */ + XDP_META_RX_HASH_L4 = 0x3, + /* Mask of the field containing checksum level (if there's encap) */ + XDP_META_RX_CSUM_LEVEL = (0x3 << 2), + /* Mask of bits containing Rx checksum status */ + XDP_META_RX_CSUM_STATUS = (0x3 << 0), + /* Metadata has no checksum info */ + XDP_META_RX_CSUM_NONE = 0x0, + /* Checksum has been verified by NIC */ + XDP_META_RX_CSUM_OK = 0x1, + /* Metadata carries valid checksum */ + XDP_META_RX_CSUM_COMP = 0x2, + + /* xdp_meta_generic::magic_id indicates that the metadata is either + * struct xdp_meta_generic itself or contains it at the end -> can be + * used to get/set HW hints. + * Direct btf_id comparison is not enough here as a custom structure + * caring xdp_meta_generic at the end will have a different ID. + */ + XDP_META_GENERIC_MAGIC = 0xeda6, +}; + +/* Generic metadata can be composed directly by HW, plus it should always + * have the first field as __le16 to account the 2 bytes of "IP align", so + * we pack it to avoid unexpected paddings. Also, it should be aligned to + * sizeof(__be16) as any other Ethernet data, and to optimize access on the + * 32-bit platforms. + */ +#define __xdp_meta_generic_attrs \ + __attribute__((__packed__)) \ + __attribute__((aligned(sizeof(__be16)))) + +/* Depending on the field layout inside the structure, it might or might not + * emit a "packed attribute is unnecessary" warning (when enabled, e.g. in + * libbpf). To not add and remove the attributes on each field addition, + * just suppress it. + */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpacked" + +/* All fields have explicit endianness, as it might be composed by HW. + * Byteswaps are needed for the Big Endian architectures to access the + * fields. + */ +struct xdp_meta_generic { + /* Add new fields here */ + + /* Egress part */ + __struct_group(/* no tag */, tx, __xdp_meta_generic_attrs, + /* Offset from the start of the frame to the L4 header + * to compute checksum for + */ + __le16 tx_csum_start; + /* Offset inside the L4 header to the checksum field */ + __le16 tx_csum_off; + /* ID for hardware VLAN push */ + __le16 tx_vid; + /* Flags indicating which Tx metadata is used */ + __le32 tx_flags; + /* Tx timestamp value */ + __le64 tx_tstamp; + ); + + /* Shortcut for the half relevant on ingress: Rx + IDs */ + __struct_group(xdp_meta_generic_rx, rx_full, __xdp_meta_generic_attrs, + /* Ingress part */ + __struct_group(/* no tag */, rx, __xdp_meta_generic_attrs, + /* Rx timestamp value */ + __le64 rx_tstamp; + /* Rx hash value */ + __le32 rx_hash; + /* Rx checksum value */ + __le32 rx_csum; + /* VLAN ID popped on Rx */ + __le16 rx_vid; + /* Rx queue ID on which the frame has arrived */ + __le16 rx_qid; + /* Flags indicating which Rx metadata is used */ + __le32 rx_flags; + ); + + /* Unique metadata identifiers */ + __struct_group(/* no tag */, id, __xdp_meta_generic_attrs, + union { + struct { +#ifdef __BIG_ENDIAN_BITFIELD + /* Indicates the ID of the BTF which + * the below type ID comes from, as + * several kernel modules may have + * identical type IDs + */ + __le32 btf_id; + /* Indicates the ID of the actual + * structure passed as metadata, + * within the above BTF ID + */ + __le32 type_id; +#else /* __LITTLE_ENDIAN_BITFIELD */ + __le32 type_id; + __le32 btf_id; +#endif /* __LITTLE_ENDIAN_BITFIELD */ + }; + /* BPF program gets IDs coded as one __u64: + * `btf_id << 32 | type_id`, allow direct + * comparison + */ + __le64 full_id; + }; + /* If set to the correct value, indicates that the + * meta is generic-compatible and can be used by + * the consumers of generic metadata + */ + __le16 magic_id; + ); + ); +} __xdp_meta_generic_attrs; + +#pragma GCC diagnostic pop + #endif /* _UAPI__LINUX_BPF_H__ */ From patchwork Tue Jun 28 19:47:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898858 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA40AC43334 for ; Tue, 28 Jun 2022 19:53:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232063AbiF1TwS (ORCPT ); Tue, 28 Jun 2022 15:52:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231273AbiF1Tuy (ORCPT ); Tue, 28 Jun 2022 15:50:54 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17655223; Tue, 28 Jun 2022 12:49:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445774; x=1687981774; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NHfK9H6RSRv2T0ejOH0djO6jg9We4wQHQ8pCfIZVGsQ=; b=lQQybeuzYeIdKVoitwuoX+OTzSkc8kyMsRZP1B2C7LF/Fxu3af3JsQyh OvQd+CsmdxTB2ALPlHvUFG94OrpzcylhHrlHWIGJC6aGs5ZrfEBeeYuAj OaPy+wfuWlX8lpru+GK0UalDJIdP16idEIJjuR1oGLzFe9NBslEAmqEQU e5Cj7s07tySYz9QKwl4A5TcgsImQVL+9+G4I5J1o05z0Rg/QGiIDcndBR HWD/Va2Ia9sAxoGcDQCS7+t705EPJnbXTYJvaLr6QxZunr8HGSLqwqq5i YIIyCvM+V21WdLT44TRB+t9C9aAjNy1HSMfdDUP9aj8f/DQX5YCgH309h g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523285" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523285" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="565181272" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga006.jf.intel.com with ESMTP; 28 Jun 2022 12:49:29 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9P022013; Tue, 28 Jun 2022 20:49:27 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 25/52] net, xdp: add basic generic metadata accessors Date: Tue, 28 Jun 2022 21:47:45 +0200 Message-Id: <20220628194812.1453059-26-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC As all of the fields in the generic XDP metadata structure have explicit Endianness, it's worth to provide some basic helpers. Add get and set accessors for each field and get, set and rep accessors for each bitfield of ::{rx,tx}_flags. rep are for the cases when it's unknown whether a flags field is clear, so they effectively replace the value in a bitfield instead of just ORing. Also add a couple of helpers: to get a pointer to the generic metadata structure and check whether a given metadata is generic compatible. Signed-off-by: Alexander Lobakin --- include/net/xdp_meta.h | 238 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 238 insertions(+) diff --git a/include/net/xdp_meta.h b/include/net/xdp_meta.h index 3a40189d71c6..f61831e39eb0 100644 --- a/include/net/xdp_meta.h +++ b/include/net/xdp_meta.h @@ -4,6 +4,7 @@ #ifndef __LINUX_NET_XDP_META_H__ #define __LINUX_NET_XDP_META_H__ +#include #include #include @@ -45,4 +46,241 @@ static inline bool xdp_metalen_invalid(unsigned long metalen) return (metalen & (sizeof(u32) - 1)) || metalen > max; } +/* This builds _get(), _set() and _rep() for each bitfield. + * If you know for sure the field is empty (e.g. you zeroed the struct + * previously), use faster _set() op to save several cycles, otherwise + * use _rep() to avoid mixing values. + */ +#define XDP_META_BUILD_FLAGS_ACC(dir, pfx, FLD) \ +static inline u32 \ +xdp_meta_##dir##_##pfx##_get(const struct xdp_meta_generic *md) \ +{ \ + static_assert(__same_type(md->dir##_flags, __le32)); \ + \ + return le32_get_bits(md->dir##_flags, XDP_META_##FLD); \ +} \ + \ +static inline void \ +xdp_meta_##dir##_##pfx##_set(struct xdp_meta_generic *md, u32 val) \ +{ \ + md->dir##_flags |= le32_encode_bits(val, XDP_META_##FLD); \ +} \ + \ +static inline void \ +xdp_meta_##dir##_##pfx##_rep(struct xdp_meta_generic *md, u32 val) \ +{ \ + le32p_replace_bits(&md->dir##_flags, val, XDP_META_##FLD); \ +} \ + +/* This builds _get() and _set() for each structure field -- those are just + * byteswap operations however. + * The second static assertion is due to that all of the fields in the + * structure should be naturally-aligned when ::magic_id starts at + * `XDP_PACKET_HEADROOM + 8n`, which is the default and recommended case. + * This check makes no sense for the efficient unaligned access platforms, + * but helps the rest. + */ +#define XDP_META_BUILD_ACC(dir, pfx, sz) \ +static inline u##sz \ +xdp_meta_##dir##_##pfx##_get(const struct xdp_meta_generic *md) \ +{ \ + static_assert(__same_type(md->dir##_##pfx, __le##sz)); \ + \ + return le##sz##_to_cpu(md->dir##_##pfx); \ +} \ + \ +static inline void \ +xdp_meta_##dir##_##pfx##_set(struct xdp_meta_generic *md, u##sz val) \ +{ \ + static_assert((XDP_PACKET_HEADROOM - sizeof(*md) + \ + sizeof_field(typeof(*md), magic_id) + \ + offsetof(typeof(*md), dir##_##pfx)) % \ + sizeof_field(typeof(*md), dir##_##pfx) == 0); \ + \ + md->dir##_##pfx = cpu_to_le##sz(val); \ +} + +#if 0 /* For grepping/indexers */ +u16 xdp_meta_tx_csum_action_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_csum_action_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_tx_csum_action_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_tx_vlan_type_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_vlan_type_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_tx_vlan_type_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_tx_tstamp_action_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_tstamp_action_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_tx_tstamp_action_rep(struct xdp_meta_generic *md, u16 val); +#endif +XDP_META_BUILD_FLAGS_ACC(tx, csum_action, TX_CSUM_ACT); +XDP_META_BUILD_FLAGS_ACC(tx, vlan_type, TX_VLAN_TYPE); +XDP_META_BUILD_FLAGS_ACC(tx, tstamp_action, TX_TSTAMP_ACT); + +#if 0 +u16 xdp_meta_tx_csum_start_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_csum_start_set(struct xdp_meta_generic *md, u64 val); +u16 xdp_meta_tx_csum_off_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_csum_off_set(struct xdp_meta_generic *md, u64 val); +u16 xdp_meta_tx_vid_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_vid_set(struct xdp_meta_generic *md, u64 val); +u32 xdp_meta_tx_flags_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_flags_set(struct xdp_meta_generic *md, u32 val); +u64 xdp_meta_tx_tstamp_get(const struct xdp_meta_generic *md); +void xdp_meta_tx_tstamp_set(struct xdp_meta_generic *md, u64 val); +#endif +XDP_META_BUILD_ACC(tx, csum_start, 16); +XDP_META_BUILD_ACC(tx, csum_off, 16); +XDP_META_BUILD_ACC(tx, vid, 16); +XDP_META_BUILD_ACC(tx, flags, 32); +XDP_META_BUILD_ACC(tx, tstamp, 64); + +#if 0 +u16 xdp_meta_rx_csum_status_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_csum_status_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_csum_status_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_csum_level_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_csum_level_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_csum_level_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_hash_type_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_hash_type_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_hash_type_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_vlan_type_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_vlan_type_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_vlan_type_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_tstamp_present_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_tstamp_present_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_tstamp_present_rep(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_qid_present_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_qid_present_set(struct xdp_meta_generic *md, u16 val); +void xdp_meta_rx_qid_present_rep(struct xdp_meta_generic *md, u16 val); +#endif +XDP_META_BUILD_FLAGS_ACC(rx, csum_status, RX_CSUM_STATUS); +XDP_META_BUILD_FLAGS_ACC(rx, csum_level, RX_CSUM_LEVEL); +XDP_META_BUILD_FLAGS_ACC(rx, hash_type, RX_HASH_TYPE); +XDP_META_BUILD_FLAGS_ACC(rx, vlan_type, RX_VLAN_TYPE); +XDP_META_BUILD_FLAGS_ACC(rx, tstamp_present, RX_TSTAMP_PRESENT); +XDP_META_BUILD_FLAGS_ACC(rx, qid_present, RX_QID_PRESENT); + +#if 0 +u64 xdp_meta_rx_tstamp_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_tstamp_set(struct xdp_meta_generic *md, u64 val); +u32 xdp_meta_rx_hash_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_hash_set(struct xdp_meta_generic *md, u32 val); +u32 xdp_meta_rx_csum_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_csum_set(struct xdp_meta_generic *md, u32 val); +u16 xdp_meta_rx_vid_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_vid_set(struct xdp_meta_generic *md, u16 val); +u16 xdp_meta_rx_qid_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_qid_set(struct xdp_meta_generic *md, u16 val); +u32 xdp_meta_rx_flags_get(const struct xdp_meta_generic *md); +void xdp_meta_rx_flags_set(struct xdp_meta_generic *md, u32 val); +#endif +XDP_META_BUILD_ACC(rx, tstamp, 64); +XDP_META_BUILD_ACC(rx, hash, 32); +XDP_META_BUILD_ACC(rx, csum, 32); +XDP_META_BUILD_ACC(rx, vid, 16); +XDP_META_BUILD_ACC(rx, qid, 16); +XDP_META_BUILD_ACC(rx, flags, 32); + +#if 0 +u32 xdp_meta_btf_id_get(const struct xdp_meta_generic *md); +void xdp_meta_btf_id_set(struct xdp_meta_generic *md, u32 val); +u32 xdp_meta_type_id_get(const struct xdp_meta_generic *md); +void xdp_meta_type_id_set(struct xdp_meta_generic *md, u32 val); +u64 xdp_meta_full_id_get(const struct xdp_meta_generic *md); +void xdp_meta_full_id_set(struct xdp_meta_generic *md, u64 val); +u16 xdp_meta_magic_id_get(const struct xdp_meta_generic *md); +void xdp_meta_magic_id_set(struct xdp_meta_generic *md, u16 val); +#endif +XDP_META_BUILD_ACC(btf, id, 32); +XDP_META_BUILD_ACC(type, id, 32); +XDP_META_BUILD_ACC(full, id, 64); +XDP_META_BUILD_ACC(magic, id, 16); + +/* This allows to jump from xdp_metadata_generic::{tx,rx_full,rx,id} to the + * parent if needed. For example, declare one of them on stack for convenience + * and still pass a generic pointer. + * No out-of-bound checks, a caller must sanitize it on its side. + */ +#define _to_gen_md(ptr, locptr, locmd) ({ \ + struct xdp_meta_generic *locmd; \ + typeof(ptr) locptr = (ptr); \ + \ + if (__same_type(*locptr, typeof(locmd->tx))) \ + locmd = (void *)locptr - offsetof(typeof(*locmd), tx); \ + else if (__same_type(*locptr, typeof(locmd->rx_full))) \ + locmd = (void *)locptr - offsetof(typeof(*locmd), rx_full); \ + else if (__same_type(*locptr, typeof(locmd->rx))) \ + locmd = (void *)locptr - offsetof(typeof(*locmd), rx); \ + else if (__same_type(*locptr, typeof(locmd->id))) \ + locmd = (void *)locptr - offsetof(typeof(*locmd), id); \ + else if (__same_type(*locptr, typeof(locmd)) || \ + __same_type(*locptr, void)) \ + locmd = (void *)locptr; \ + else \ + BUILD_BUG(); \ + \ + locmd; \ +}) +#define to_gen_md(ptr) _to_gen_md((ptr), __UNIQUE_ID(ptr_), __UNIQUE_ID(md_)) + +/* This allows to pass an xdp_meta_generic pointer instead of an + * xdp_meta_generic::rx{,_full} pointer for convenience. + */ +#define _to_rx_md(ptr, locptr, locmd) ({ \ + struct xdp_meta_generic_rx *locmd; \ + typeof(ptr) locptr = (ptr); \ + \ + if (__same_type(*locptr, struct xdp_meta_generic_rx)) \ + locmd = (struct xdp_meta_generic_rx *)locptr; \ + else if (__same_type(*locptr, struct xdp_meta_generic) || \ + __same_type(*locptr, void)) \ + locmd = &((struct xdp_meta_generic *)locptr)->rx_full; \ + else \ + BUILD_BUG(); \ + \ + locmd; \ +}) +#define to_rx_md(ptr) _to_rx_md((ptr), __UNIQUE_ID(ptr_), __UNIQUE_ID(md_)) + +/** + * xdp_meta_has_generic - get a pointer to the generic metadata before a frame + * @data: a pointer to the beginning of the frame + * + * Note: the function does not perform any access sanity checks, they should + * be done manually prior to calling it. + * + * Returns a pointer to the beginning of the generic metadata. + */ +static inline struct xdp_meta_generic *xdp_meta_generic_ptr(const void *data) +{ + BUILD_BUG_ON(xdp_metalen_invalid(sizeof(struct xdp_meta_generic))); + + return (void *)data - sizeof(struct xdp_meta_generic); +} + +/** + * xdp_meta_has_generic - check whether a frame has a generic meta in front + * @data: a pointer to the beginning of the frame + * + * Returns true if it does, false otherwise. + */ +static inline bool xdp_meta_has_generic(const void *data) +{ + return xdp_meta_generic_ptr(data)->magic_id == + cpu_to_le16(XDP_META_GENERIC_MAGIC); +} + +/** + * xdp_meta_skb_has_generic - check whether an skb has a generic meta + * @skb: a pointer to the &sk_buff + * + * Note: must be called only when skb_mac_header_was_set(skb) == true. + * + * Returns true if it does, false otherwise. + */ +static inline bool xdp_meta_skb_has_generic(const struct sk_buff *skb) +{ + return xdp_meta_has_generic(skb_metadata_end(skb)); +} + #endif /* __LINUX_NET_XDP_META_H__ */ From patchwork Tue Jun 28 19:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898871 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644B6C43334 for ; Tue, 28 Jun 2022 19:53:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231901AbiF1TxM (ORCPT ); Tue, 28 Jun 2022 15:53:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231344AbiF1Tuy (ORCPT ); Tue, 28 Jun 2022 15:50:54 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8A3032A; Tue, 28 Jun 2022 12:49:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445775; x=1687981775; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aCT7NnW5qPpXUE6PtAGgHOwhQXci7vR5TCit02FfCKA=; b=LtdfgLRuo4Hoh3h0n/cMTHpuB6VfSrVy6ps0knn7S4Om2jj2Pt8Kmzkd i0qFl8Smyr0rmsIhjfRZ2AhNVPpo7gDRdkEQas2T4xhWWcNJ3Nlqoc0Xv sCwMxkDoDDmmBvzwVF2ljLfHR0q5iJ1F9hyByFuP5cU89sCFmZs2DuAxh fbKey+CYfRjaIB6uG4Q2uw4biD02nntbi2tqGk8P/Bq+0LqDYT/qjowkt 8Sj7n/VhrFecUB4YZmXKi0pGYRPWhAloYk0RfljWYa/m8GmVcJXw5aZPM b3/hYojjhwxa/HJPjriU6SYNnJ+1z+Dp1ntoYk7s7hvdKFazObhpP3hGI g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568197" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568197" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="587988541" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga007.jf.intel.com with ESMTP; 28 Jun 2022 12:49:30 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9Q022013; Tue, 28 Jun 2022 20:49:28 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 26/52] bpf, btf: add a pair of function to work with the BTF ID + type ID pair Date: Tue, 28 Jun 2022 21:47:46 +0200 Message-Id: <20220628194812.1453059-27-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a kernel counterpart of libbpf_get_type_btf_id() to easily get the pair of BTF ID << 32 | type ID for the provided type. Drivers and the XDP core will use it to handle different XDP generic metadata formats. Also add a function to return matching type string (e.g. "struct foo") index from an array of such strings for a given BTF ID + type ID pair. The intention is to be able to quickly identify the ID received from somewhere else and to assign some own constant identifiers to the supported types. To not do: priv->foo_id = bpf_get_type_btf_id("struct foo"); priv->bar_id = bpf_get_type_btf_id("struct bar"); [...] if (id == priv->foo_id) do_smth_for_foo(); else if (id == priv->bar_id) do_smth_for_bar(); else unsupp(); but instead: const char * const supp[] = { [FOO_ID] = "struct foo", [BAR_ID] = "struct bar", NULL, // serves as a terminator, can be "" }; [...] type = bpf_match_type_btf_id(supp, id); switch(type) { case FOO_ID: do_smth_for_foo(); break; case BAR_ID: do_smth_for_bar(); break; default: unsupp(); break; } Aux function: * btf_kind_from_str(): returns the kind of the provided full type string and removes the kind identifier to e.g. be able to pass it directly to btf_find_by_name_kind(). For example, "struct foo" becomes "foo" and the return value will be BTF_KIND_STRUCT. * btf_get_by_id() is a shorthand to quickly get the BTF by its ID, factored-out from btf_get_fd_by_id(). Signed-off-by: Alexander Lobakin --- include/linux/btf.h | 13 +++++ kernel/bpf/btf.c | 133 ++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 140 insertions(+), 6 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 1bfed7fa0428..36bc9c499409 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -386,6 +386,8 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt, struct module *owner); +int bpf_get_type_btf_id(const char *type, u64 *res_id); +int bpf_match_type_btf_id(const char * const *list, u64 id); #else static inline const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id) @@ -418,6 +420,17 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt { return 0; } +static inline int bpf_get_type_btf_id(const char *type, u64 *res_id) +{ + if (res_id) + *res_id = 0; + + return -ENOSYS; +} +static inline int bpf_match_type_btf_id(const char * const *list, u64 id) +{ + return -ENOSYS; +} #endif #endif diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 2e2066d6af94..dc316c43a348 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -317,6 +317,28 @@ const char *btf_type_str(const struct btf_type *t) return btf_kind_str[BTF_INFO_KIND(t->info)]; } +static u32 btf_kind_from_str(const char **type) +{ + const char *pos, *orig = *type; + u32 kind; + int len; + + pos = strchr(orig, ' '); + if (pos) { + len = pos - orig; + *type = pos + 1; + } else { + len = strlen(orig); + } + + for (kind = BTF_KIND_UNKN; kind < NR_BTF_KINDS; kind++) { + if (!strncasecmp(orig, btf_kind_str[kind], len)) + break; + } + + return kind < NR_BTF_KINDS ? kind : BTF_KIND_UNKN; +} + /* Chunk size we use in safe copy of data to be shown. */ #define BTF_SHOW_OBJ_SAFE_SIZE 32 @@ -579,6 +601,110 @@ static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p) return ret; } +/** + * bpf_get_type_btf_id - get the pair BTF ID + type ID for a given type + * @type: pointer to the name of the type to look for + * @res_id: pointer to write the result to + * + * Tries to find the BTF corresponding to the provided type (full string) and + * write the pair of BTF ID << 32 | type ID. Such coded __u64 are being used + * in XDP generic-compatible metadata to distinguish between different + * metadata structures. + * @res_id can be %NULL to only check if a particular type exists within + * the BTF. + * + * Returns 0 in case of success, an error code otherwise. + */ +int bpf_get_type_btf_id(const char *type, u64 *res_id) +{ + struct btf *btf = NULL; + s32 type_id; + u32 kind; + + if (res_id) + *res_id = 0; + + if (!type || !*type) + return -EINVAL; + + kind = btf_kind_from_str(&type); + + type_id = bpf_find_btf_id(type, kind, &btf); + if (type_id > 0 && res_id) + *res_id = ((u64)btf_obj_id(btf) << 32) | type_id; + + btf_put(btf); + + return min(type_id, 0); +} +EXPORT_SYMBOL_GPL(bpf_get_type_btf_id); + +static struct btf *btf_get_by_id(u32 id) +{ + struct btf *btf; + + rcu_read_lock(); + btf = idr_find(&btf_idr, id); + if (!btf || !refcount_inc_not_zero(&btf->refcnt)) + btf = ERR_PTR(-ENOENT); + rcu_read_unlock(); + + return btf; +} + +/** + * bpf_match_type_btf_id - find a type name corresponding to a given full ID + * @list: pointer to the %NULL-terminated list of type names + * @id: full ID (BTF ID + type ID) of the type to look + * + * Do the opposite to what bpf_get_type_btf_id() does: looks over the + * candidates in %NULL-terminated @list and tries to find a match for + * the given ID. If found, returns its index. + * + * Returns a string array element index on success, an error code otherwise. + */ +int bpf_match_type_btf_id(const char * const *list, u64 id) +{ + const struct btf_type *t; + int ret = -ENOENT; + const char *name; + struct btf *btf; + u32 kind; + + btf = btf_get_by_id(upper_32_bits(id)); + if (IS_ERR(btf)) + return PTR_ERR(btf); + + t = btf_type_by_id(btf, lower_32_bits(id)); + if (!t) + goto err_put; + + name = btf_name_by_offset(btf, t->name_off); + if (!name) { + ret = -EINVAL; + goto err_put; + } + + kind = BTF_INFO_KIND(t->info); + + for (u32 i = 0; ; i++) { + const char *cand = list[i]; + + if (!cand) + break; + + if (btf_kind_from_str(&cand) == kind && !strcmp(cand, name)) { + ret = i; + break; + } + } + +err_put: + btf_put(btf); + + return ret; +} + const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, u32 id, u32 *res_id) { @@ -6804,12 +6930,7 @@ int btf_get_fd_by_id(u32 id) struct btf *btf; int fd; - rcu_read_lock(); - btf = idr_find(&btf_idr, id); - if (!btf || !refcount_inc_not_zero(&btf->refcnt)) - btf = ERR_PTR(-ENOENT); - rcu_read_unlock(); - + btf = btf_get_by_id(id); if (IS_ERR(btf)) return PTR_ERR(btf); From patchwork Tue Jun 28 19:47:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898872 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57AC7CCA47F for ; Tue, 28 Jun 2022 19:53:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229627AbiF1TxQ (ORCPT ); Tue, 28 Jun 2022 15:53:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231462AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DACFB24F00; Tue, 28 Jun 2022 12:49:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445776; x=1687981776; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m0oceUhK3hnv1otJUNLutQy3sAaI+qAGxK7L6haMKZI=; b=nSIsnJphcuhXLkcH2HfNfEeKyPh9oDKz9JmT3EYe1Ncxklz0Kvm+eG5w KBpCSh/sNjru5IlS81X7ucX5un6NDcz/aIAj3LqXW/WOE+27xnyAvSdH+ EurClCLaeiPmXOLpVS65F4W9EsfAVZonGkCnARThmJWF2ac9vHss7j9jz 9Gwoxn+Ofx3/PAo1DZRCAbTebS2jtp5jsvb+35GJ5QRXlxtL9GVrvxMOL wIAlg37VyPWp0DCk+WMqEw2ID6SHc9a/YJSNTf7mfG5/fAPLRY+f2ROv5 zisx5hV8pzhm+aHfYk1bG29Ecc3/cmvccZ8j/lg50jYxoIxyd3Yn1YOXA A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="264874164" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="264874164" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="623054140" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga001.jf.intel.com with ESMTP; 28 Jun 2022 12:49:31 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9R022013; Tue, 28 Jun 2022 20:49:30 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 27/52] net, xdp: add &sk_buff <-> &xdp_meta_generic converters Date: Tue, 28 Jun 2022 21:47:47 +0200 Message-Id: <20220628194812.1453059-28-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add two functions (with their underscored versions) to pass HW-origined info (checksums, hashes, Rx queue ID etc.) from an skb to an XDP generic metadata and vice versa. They can be used to carry that info between hardware, xdp_buff/xdp_frame and sk_buff. The &sk_buff -> &xdp_meta_generic converter uses a static, init-time filled &xdp_meta_tail to not query BTF info on hotpath. For the fields which values are being assigned directly, make sure they match with the help of static asserts. Also add a wrapper bpf_match_type_btf_id() designed especially for drivers and taking care of corner-cases. Signed-off-by: Alexander Lobakin --- include/net/xdp_meta.h | 112 +++++++++++++++++++++++++++++++ net/bpf/core.c | 148 ++++++++++++++++++++++++++++++++++++++++- 2 files changed, 259 insertions(+), 1 deletion(-) diff --git a/include/net/xdp_meta.h b/include/net/xdp_meta.h index f61831e39eb0..d37ea873a6a8 100644 --- a/include/net/xdp_meta.h +++ b/include/net/xdp_meta.h @@ -46,6 +46,17 @@ static inline bool xdp_metalen_invalid(unsigned long metalen) return (metalen & (sizeof(u32) - 1)) || metalen > max; } +/* We use direct assignments from &xdp_meta_generic to &sk_buff fields, + * thus they must match. + */ +static_assert((u32)XDP_META_RX_CSUM_NONE == (u32)CHECKSUM_NONE); +static_assert((u32)XDP_META_RX_CSUM_OK == (u32)CHECKSUM_UNNECESSARY); +static_assert((u32)XDP_META_RX_CSUM_COMP == (u32)CHECKSUM_COMPLETE); +static_assert((u32)XDP_META_RX_HASH_NONE == (u32)PKT_HASH_TYPE_NONE); +static_assert((u32)XDP_META_RX_HASH_L2 == (u32)PKT_HASH_TYPE_L2); +static_assert((u32)XDP_META_RX_HASH_L3 == (u32)PKT_HASH_TYPE_L3); +static_assert((u32)XDP_META_RX_HASH_L4 == (u32)PKT_HASH_TYPE_L4); + /* This builds _get(), _set() and _rep() for each bitfield. * If you know for sure the field is empty (e.g. you zeroed the struct * previously), use faster _set() op to save several cycles, otherwise @@ -283,4 +294,105 @@ static inline bool xdp_meta_skb_has_generic(const struct sk_buff *skb) return xdp_meta_has_generic(skb_metadata_end(skb)); } +/** + * xdp_meta_init - initialize a metadata structure + * @md: pointer to xdp_meta_generic or its ::rx_full or its ::id member + * @id: full BTF + type ID for the metadata type (can be u* or __le64) + * + * Zeroes the passed metadata struct (or part) and initializes its tail, so + * it becomes ready for further processing. If a driver is responsible for + * composing metadata, it is important to zero the space it occupies in each + * Rx buffer as `xdp->data - xdp->data_hard_start` doesn't get initialized + * by default. + */ +#define _xdp_meta_init(md, id, locmd, locid) ({ \ + typeof(md) locmd = (md); \ + typeof(id) locid = (id); \ + \ + if (offsetof(typeof(*locmd), full_id)) \ + memset(locmd, 0, offsetof(typeof(*locmd), full_id)); \ + \ + locmd->full_id = __same_type(locid, __le64) ? (__force __le64)locid : \ + cpu_to_le64((__force u64)locid); \ + locmd->magic_id = cpu_to_le16(XDP_META_GENERIC_MAGIC); \ +}) +#define xdp_meta_init(md, id) \ + _xdp_meta_init((md), (id), __UNIQUE_ID(md_), __UNIQUE_ID(id_)) + +void ___xdp_build_meta_generic_from_skb(struct xdp_meta_generic_rx *rx_md, + const struct sk_buff *skb); +void ___xdp_populate_skb_meta_generic(struct sk_buff *skb, + const struct xdp_meta_generic_rx *rx_md); + +#define _xdp_build_meta_generic_from_skb(md, skb, locmd) ({ \ + typeof(md) locmd = (md); \ + \ + if (offsetof(typeof(*locmd), rx)) \ + memset(locmd, 0, offsetof(typeof(*locmd), rx)); \ + \ + ___xdp_build_meta_generic_from_skb(to_rx_md(locmd), skb); \ +}) +#define __xdp_build_meta_generic_from_skb(md, skb) \ + _xdp_build_meta_generic_from_skb((md), (skb), __UNIQUE_ID(md_)) + +#define __xdp_populate_skb_meta_generic(skb, md) \ + ___xdp_populate_skb_meta_generic((skb), to_rx_md(md)) + +/** + * xdp_build_meta_generic_from_skb - build the generic meta before the skb data + * @skb: a pointer to the &sk_buff + * + * Builds an XDP generic metadata in front of the skb data from its fields. + * Note: skb->mac_header must be set and valid. + */ +static inline void xdp_build_meta_generic_from_skb(struct sk_buff *skb) +{ + struct xdp_meta_generic *md; + u32 needed; + + /* skb_headroom() is `skb->data - skb->head`, i.e. it doesn't account + * for the pulled headers, e.g. MAC header. Metadata resides in front + * of the MAC header, so counting starts from there, not the current + * data pointer position. + * CoW won't happen in here when coming from Generic XDP path as it + * ensures that an skb has at least %XDP_PACKET_HEADROOM beforehand. + * It won't be happening also as long as `sizeof(*md) <= NET_SKB_PAD`. + */ + needed = (void *)skb->data - skb_metadata_end(skb) + sizeof(*md); + if (unlikely(skb_cow_head(skb, needed))) + return; + + md = xdp_meta_generic_ptr(skb_metadata_end(skb)); + __xdp_build_meta_generic_from_skb(md, skb); + + skb_metadata_set(skb, sizeof(*md)); + skb_metadata_nocomp_set(skb); +} + +/** + * xdp_populate_skb_meta_generic - fill an skb from the metadata in front of it + * @skb: a pointer to the &sk_buff + * + * Fills the skb fields from the metadata in front of its MAC header and marks + * its metadata as "non-comparable". + * Note: skb->mac_header must be set and valid. + */ +static inline void xdp_populate_skb_meta_generic(struct sk_buff *skb) +{ + const struct xdp_meta_generic *md; + + if (skb_metadata_len(skb) < sizeof(*md)) + return; + + md = xdp_meta_generic_ptr(skb_metadata_end(skb)); + __xdp_populate_skb_meta_generic(skb, md); + + /* We know at this point that skb metadata may contain + * unique values, mark it as nocomp to not confuse GRO. + */ + skb_metadata_nocomp_set(skb); +} + +int xdp_meta_match_id(const char * const *list, u64 id); + #endif /* __LINUX_NET_XDP_META_H__ */ diff --git a/net/bpf/core.c b/net/bpf/core.c index 18174d6d8687..a8685bcc6e00 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -3,7 +3,7 @@ * * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. */ -#include +#include #include #include #include @@ -713,3 +713,149 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } + +/** + * xdp_meta_match_id - find a type name corresponding to a given full ID + * @list: pointer to the %NULL-terminated list of type names + * @id: full ID (BTF ID + type ID) of the type to look + * + * Convenience wrapper over bpf_match_type_btf_id() for usage in drivers which + * takes care of zeroed ID and BPF syscall being not compiled in (to not break + * code flow and return "no meta"). + * + * Returns a string array element index on success, an error code otherwise. + */ +int xdp_meta_match_id(const char * const *list, u64 id) +{ + int ret; + + if (unlikely(!list || !*list)) + return id ? -EINVAL : 0; + + ret = bpf_match_type_btf_id(list, id); + if (ret == -ENOSYS || !id) { + for (ret = 0; list[ret]; ret++) + ; + } + + return ret; +} +EXPORT_SYMBOL_GPL(xdp_meta_match_id); + +/* Used in __xdp_build_meta_generic_from_skb() to quickly get the ID + * on hotpath. + */ +static __le64 xdp_meta_generic_id __ro_after_init; + +static int __init xdp_meta_generic_id_init(void) +{ + int ret; + u64 id; + + ret = bpf_get_type_btf_id("struct xdp_meta_generic", &id); + xdp_meta_generic_id = cpu_to_le64(id); + + return ret; +} +late_initcall(xdp_meta_generic_id_init); + +#define _xdp_meta_rx_hash_type_from_skb(skb, locskb) ({ \ + typeof(skb) locskb = (skb); \ + \ + likely((locskb)->l4_hash) ? XDP_META_RX_HASH_L4 : \ + skb_get_hash_raw(locskb) ? XDP_META_RX_HASH_L3 : \ + XDP_META_RX_HASH_NONE; \ +}) +#define xdp_meta_rx_hash_type_from_skb(skb) \ + _xdp_meta_rx_hash_type_from_skb((skb), __UNIQUE_ID(skb_)) + +#define xdp_meta_rx_vlan_from_prot(skb) ({ \ + (skb)->vlan_proto == htons(ETH_P_8021Q) ? \ + XDP_META_RX_CVID : XDP_META_RX_SVID; \ +}) + +#define xdp_meta_rx_vlan_to_prot(md) ({ \ + xdp_meta_rx_vlan_type_get(md) == XDP_META_RX_CVID ? \ + htons(ETH_P_8021Q) : htons(ETH_P_8021AD); \ +}) + +/** + * ___xdp_build_meta_generic_from_skb - fill a generic metadata from an skb + * @rx_md: a pointer to the XDP generic metadata to be filled + * @skb: a pointer to the skb to take the info from + * + * Fills a given generic metadata struct with the info set previously in + * an skb. @md can point to anywhere and the function doesn't use the + * skb_metadata_{end,len}(). + */ +void ___xdp_build_meta_generic_from_skb(struct xdp_meta_generic_rx *rx_md, + const struct sk_buff *skb) +{ + struct xdp_meta_generic *md = to_gen_md(rx_md); + ktime_t ts; + + xdp_meta_init(rx_md, xdp_meta_generic_id); + + xdp_meta_rx_csum_level_set(md, skb->csum_level); + xdp_meta_rx_csum_status_set(md, skb->ip_summed); + xdp_meta_rx_csum_set(md, skb->csum); + + xdp_meta_rx_hash_set(md, skb_get_hash_raw(skb)); + xdp_meta_rx_hash_type_set(md, xdp_meta_rx_hash_type_from_skb(skb)); + + if (likely(skb_rx_queue_recorded(skb))) { + xdp_meta_rx_qid_present_set(md, 1); + xdp_meta_rx_qid_set(md, skb_get_rx_queue(skb)); + } + + if (skb_vlan_tag_present(skb)) { + xdp_meta_rx_vlan_type_set(md, xdp_meta_rx_vlan_from_prot(skb)); + xdp_meta_rx_vid_set(md, skb_vlan_tag_get(skb)); + } + + ts = skb_hwtstamps(skb)->hwtstamp; + if (ts) { + xdp_meta_rx_tstamp_present_set(md, 1); + xdp_meta_rx_tstamp_set(md, ktime_to_ns(ts)); + } +} +EXPORT_SYMBOL_GPL(___xdp_build_meta_generic_from_skb); + +/** + * ___xdp_populate_skb_meta_generic - fill the skb fields from a generic meta + * @skb: a pointer to the skb to be filled + * @rx_md: a pointer to the generic metadata to take the values from + * + * Populates the &sk_buff fields from a given XDP generic metadata. A meta + * can be from anywhere, the function doesn't use skb_metadata_{end,len}(). + * Checks whether the metadata is generic-compatible before accessing other + * fields. + */ +void ___xdp_populate_skb_meta_generic(struct sk_buff *skb, + const struct xdp_meta_generic_rx *rx_md) +{ + const struct xdp_meta_generic *md = to_gen_md(rx_md); + + if (unlikely(!xdp_meta_has_generic(md + 1))) + return; + + skb->csum_level = xdp_meta_rx_csum_level_get(md); + skb->ip_summed = xdp_meta_rx_csum_status_get(md); + skb->csum = xdp_meta_rx_csum_get(md); + + skb_set_hash(skb, xdp_meta_rx_hash_get(md), + xdp_meta_rx_hash_type_get(md)); + + if (likely(xdp_meta_rx_qid_present_get(md))) + skb_record_rx_queue(skb, xdp_meta_rx_qid_get(md)); + + if (xdp_meta_rx_vlan_type_get(md)) + __vlan_hwaccel_put_tag(skb, xdp_meta_rx_vlan_to_prot(md), + xdp_meta_rx_vid_get(md)); + + if (xdp_meta_rx_tstamp_present_get(md)) + *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ + .hwtstamp = ns_to_ktime(xdp_meta_rx_tstamp_get(md)), + }; +} +EXPORT_SYMBOL_GPL(___xdp_populate_skb_meta_generic); From patchwork Tue Jun 28 19:47:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898884 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FBACCA47E for ; Tue, 28 Jun 2022 19:54:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233745AbiF1Tyg (ORCPT ); Tue, 28 Jun 2022 15:54:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231474AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AA8930C; Tue, 28 Jun 2022 12:49:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445778; x=1687981778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PAudLE7AakzDffKtv0p+t6VMKmoJsLKi2pIu7DFNm7Y=; b=GPhNQCE+mdaEOanby3XEt8DJnaHZmMPIJ/6/2WXZYGEv6ES000F86NF5 NVdGnW+lz0YAuVLrAAT3JtjV4N3CAmIZXZNtfkF9EaVWU+3ehPpK+ra+p 2z/dCLDrDdbXgGfTqIE+qT2XgvQuJCtoWbnLjGx380qWKX9ffR7wJ8pDL Jn+mlJSZIOR5p5z0yph+ALUb4g2uuNbnn9hXhcn8XSVyC/GTjFYJffbIb 30Tg8QAtNeRQroLVnOEAvNdekf427Mkn7cuwhyNQ+qXi5nkmPi4SQBAKk BeeGhVuDvuojzxLWEuw2bEQQuG0Wk9Zm9Wd1woJ1AUc1Tz0rpXX6zjgOA Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568212" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568212" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="590426362" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga002.jf.intel.com with ESMTP; 28 Jun 2022 12:49:33 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9S022013; Tue, 28 Jun 2022 20:49:31 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 28/52] net, xdp: prefetch data a bit when building an skb from an &xdp_frame Date: Tue, 28 Jun 2022 21:47:48 +0200 Message-Id: <20220628194812.1453059-29-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Different cpumap tests showed that a couple of little careful prefetches helps the performance. The only thing is to not go crazy: only one cacheline to the right from the frame start and one to the left -- if there is a metadata in front. Signed-off-by: Alexander Lobakin --- net/bpf/core.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/net/bpf/core.c b/net/bpf/core.c index a8685bcc6e00..775f9648e8cf 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -620,10 +620,26 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct net_device *dev) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); + u32 dist, metasize = xdpf->metasize; unsigned int headroom, frame_size; + void *data = xdpf->data; void *hard_start; u8 nr_frags; + /* Bring the headers to the current CPU, as well as the + * metadata if present. This helps eth_type_trans() and + * xdp_populate_skb_meta_generic(). + * The idea here is to prefetch no more than 2 cachelines: + * one to the left from the data start and one to the right. + */ +#define to_cl(ptr) PTR_ALIGN_DOWN(ptr, L1_CACHE_BYTES) + dist = min_t(typeof(dist), metasize, L1_CACHE_BYTES); + if (dist && to_cl(data - dist) != to_cl(data)) + prefetch(data - dist); +#undef to_cl + + prefetch(data); + /* xdp frags frame */ if (unlikely(xdp_frame_has_frags(xdpf))) nr_frags = sinfo->nr_frags; @@ -636,15 +652,15 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, */ frame_size = xdpf->frame_sz; - hard_start = xdpf->data - headroom; + hard_start = data - headroom; skb = build_skb_around(skb, hard_start, frame_size); if (unlikely(!skb)) return NULL; skb_reserve(skb, headroom); __skb_put(skb, xdpf->len); - if (xdpf->metasize) - skb_metadata_set(skb, xdpf->metasize); + if (metasize) + skb_metadata_set(skb, metasize); if (unlikely(xdp_frame_has_frags(xdpf))) xdp_update_skb_shared_info(skb, nr_frags, From patchwork Tue Jun 28 19:47:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898883 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F0DDC43334 for ; Tue, 28 Jun 2022 19:54:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233653AbiF1Tyd (ORCPT ); Tue, 28 Jun 2022 15:54:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231688AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E6193AA; Tue, 28 Jun 2022 12:49:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445779; x=1687981779; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=djZbQklvKIuE0rRB8nAV8RBPw505lomk91c2ko+Lmfs=; b=ijzIzJDX3lpWoiW/Gqse/gdACdmupLkPtKLcsi+u4rXDVaqxnZnC0DnE sxb44cZZ4KD4Xd6oiHXwef4Nd0Gf6x/KbDz8YMnCwjKTsRXDaY6K1rmzB oy7Df2ibie4iy4L07ifrOCG1ewePAptL1fWvgcJgwGWF9nyGHZJnA1QVq dpCrtLxpSTEoyXumpMJDS3DcO/EBSlnyZyYx6tPekB+lXKKUAkAJkHngo Rbw/HQTY+A5VeBmSFsurg8Okuq5bcuX/Fksf9gPTb2KFXGkwQmOmhYEt0 qa7RhM1PUvQj4rLG9o32dPcRmbqTX702XmIMqhfAS7JDC0H6+GfQfNvg6 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="270596039" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="270596039" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="693251043" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga002.fm.intel.com with ESMTP; 28 Jun 2022 12:49:34 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9T022013; Tue, 28 Jun 2022 20:49:32 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 29/52] net, xdp: try to fill skb fields when converting from an &xdp_frame Date: Tue, 28 Jun 2022 21:47:49 +0200 Message-Id: <20220628194812.1453059-30-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In __xdp_build_skb_from_frame(), if there's a metadata in front of the data, check if it's a generic-compatible metadata and try to populate the HW-originated skb fields: checksum status, hash etc. As xdp_populate_skb_meta_generic() requires the skb->mac_header to be set and valid, call the skb_reset_mac_header() first, as skb->data at this point is pointing (sic!) to the MAC header. The two most obvious users are cpumap and veth. Signed-off-by: Alexander Lobakin --- net/bpf/core.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/net/bpf/core.c b/net/bpf/core.c index 775f9648e8cf..d2d01b8e6441 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -659,8 +659,11 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, skb_reserve(skb, headroom); __skb_put(skb, xdpf->len); - if (metasize) + if (metasize) { + skb_reset_mac_header(skb); skb_metadata_set(skb, metasize); + xdp_populate_skb_meta_generic(skb); + } if (unlikely(xdp_frame_has_frags(xdpf))) xdp_update_skb_shared_info(skb, nr_frags, @@ -671,12 +674,6 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, /* Essential SKB info: protocol and skb->dev */ skb->protocol = eth_type_trans(skb, dev); - /* Optional SKB info, currently missing: - * - HW checksum info (skb->ip_summed) - * - HW RX hash (skb_set_hash) - * - RX ring dev queue index (skb_record_rx_queue) - */ - /* Until page_pool get SKB return path, release DMA here */ xdp_release_frame(xdpf); From patchwork Tue Jun 28 19:47:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898873 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FF2ECCA481 for ; Tue, 28 Jun 2022 19:53:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231652AbiF1TxZ (ORCPT ); Tue, 28 Jun 2022 15:53:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231718AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25BE6B32; Tue, 28 Jun 2022 12:49:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445781; x=1687981781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n8tgrMr0VAIA81agqiL4xe/2c0TDcto2yGzG790rA4Y=; b=kmSoO7mpN/IUIF/DLdVYQuwt4yMAMSTqQEq/K4h2Keq8hYLV5SMdz/mj YFcfhxoWDjPVBBjwuf72ChjjmbPXJKzMOwDdPLz0Xy9v+R9aRofslXSaX cA2FrRMrbqEi1/+LyC0CMk7Wpr5RLE2jOues2MHZ4ZVttyhx5i/3v7Pve V7g4dXJfiGtg8FRqYN7qCG7DmTB8zmbvhtFydTUa1BXr7S1nmgkj2i/Mh cPjB9gA2FHKJ7JzeojYHxSUdULAEalcPcYRmxzkB/RTxuDWCvqA2e/epm aE2bPkChXY9hoG7Dyv+59x2Uzx5av/D3aKSjq7zJXI6LmNt6bgJhTwemn w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568232" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568232" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="590426373" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga002.jf.intel.com with ESMTP; 28 Jun 2022 12:49:35 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9U022013; Tue, 28 Jun 2022 20:49:34 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 30/52] net, gro: decouple GRO from the NAPI layer Date: Tue, 28 Jun 2022 21:47:50 +0200 Message-Id: <20220628194812.1453059-31-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In fact, these two are not tied closely to each other. The only requirement to GRO is to use it in the BH context and have some sane limits on the packet batches, e.g. NAPI has a limit of its budget (64/8/etc.). Factor out purely GRO fields into a new structure, &gro_node. Embed it into &napi_struct and adjust all the references. ::timer was moved because it is more tied to GRO than to NAPI as the former relies on deciding whether to do a full or a partial flush. This does not make GRO ready to use outside of the NAPI context yet. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/brocade/bna/bnad.c | 1 + drivers/net/ethernet/cortina/gemini.c | 1 + include/linux/netdevice.h | 19 ++++--- include/net/gro.h | 35 ++++++++---- net/core/dev.c | 75 +++++++++++-------------- net/core/gro.c | 63 ++++++++++----------- 6 files changed, 103 insertions(+), 91 deletions(-) diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c index f6fe08df568b..8bcae1616b15 100644 --- a/drivers/net/ethernet/brocade/bna/bnad.c +++ b/drivers/net/ethernet/brocade/bna/bnad.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "bnad.h" #include "bna.h" diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c index 9e6de2f968fa..6f208ce457dd 100644 --- a/drivers/net/ethernet/cortina/gemini.c +++ b/drivers/net/ethernet/cortina/gemini.c @@ -40,6 +40,7 @@ #include #include #include +#include #include "gemini.h" diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index bc2d82a3d0de..60df42b3f116 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -318,11 +318,19 @@ struct gro_list { }; /* - * size of gro hash buckets, must less than bit number of - * napi_struct::gro_bitmask + * size of gro hash buckets, must be <= the number of bits in + * gro_node::bitmask */ #define GRO_HASH_BUCKETS 8 +struct gro_node { + unsigned long bitmask; /* Mask of used buckets */ + struct gro_list hash[GRO_HASH_BUCKETS]; /* Pending GRO skbs */ + struct list_head rx_list; /* Pending GRO_NORMAL skbs */ + int rx_count; /* Length of rx_list */ + struct hrtimer timer; /* Timer for deferred flush */ +}; + /* * Structure for NAPI scheduling similar to tasklet but with weighting */ @@ -338,17 +346,13 @@ struct napi_struct { unsigned long state; int weight; int defer_hard_irqs_count; - unsigned long gro_bitmask; int (*poll)(struct napi_struct *, int); #ifdef CONFIG_NETPOLL int poll_owner; #endif struct net_device *dev; - struct gro_list gro_hash[GRO_HASH_BUCKETS]; + struct gro_node gro; struct sk_buff *skb; - struct list_head rx_list; /* Pending GRO_NORMAL skbs */ - int rx_count; /* length of rx_list */ - struct hrtimer timer; struct list_head dev_list; struct hlist_node napi_hash_node; unsigned int napi_id; @@ -3788,7 +3792,6 @@ int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); void netif_receive_skb_list(struct list_head *head); gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb); -void napi_gro_flush(struct napi_struct *napi, bool flush_old); struct sk_buff *napi_get_frags(struct napi_struct *napi); gro_result_t napi_gro_frags(struct napi_struct *napi); struct packet_offload *gro_find_receive_by_type(__be16 type); diff --git a/include/net/gro.h b/include/net/gro.h index 867656b0739c..75211ebd8765 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -421,26 +421,41 @@ static inline __wsum ip6_gro_compute_pseudo(struct sk_buff *skb, int proto) } int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); +void __gro_flush(struct gro_node *gro, bool flush_old); + +static inline void gro_flush(struct gro_node *gro, bool flush_old) +{ + if (!gro->bitmask) + return; + + __gro_flush(gro, flush_old); +} + +static inline void napi_gro_flush(struct napi_struct *napi, bool flush_old) +{ + gro_flush(&napi->gro, flush_old); +} /* Pass the currently batched GRO_NORMAL SKBs up to the stack. */ -static inline void gro_normal_list(struct napi_struct *napi) +static inline void gro_normal_list(struct gro_node *gro) { - if (!napi->rx_count) + if (!gro->rx_count) return; - netif_receive_skb_list_internal(&napi->rx_list); - INIT_LIST_HEAD(&napi->rx_list); - napi->rx_count = 0; + netif_receive_skb_list_internal(&gro->rx_list); + INIT_LIST_HEAD(&gro->rx_list); + gro->rx_count = 0; } /* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded, * pass the whole batch up to the stack. */ -static inline void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb, int segs) +static inline void gro_normal_one(struct gro_node *gro, struct sk_buff *skb, + int segs) { - list_add_tail(&skb->list, &napi->rx_list); - napi->rx_count += segs; - if (napi->rx_count >= gro_normal_batch) - gro_normal_list(napi); + list_add_tail(&skb->list, &gro->rx_list); + gro->rx_count += segs; + if (gro->rx_count >= gro_normal_batch) + gro_normal_list(gro); } diff --git a/net/core/dev.c b/net/core/dev.c index 52b64d24c439..8b334aa974c2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5765,7 +5765,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) return false; if (work_done) { - if (n->gro_bitmask) + if (n->gro.bitmask) timeout = READ_ONCE(n->dev->gro_flush_timeout); n->defer_hard_irqs_count = READ_ONCE(n->dev->napi_defer_hard_irqs); } @@ -5775,15 +5775,13 @@ bool napi_complete_done(struct napi_struct *n, int work_done) if (timeout) ret = false; } - if (n->gro_bitmask) { - /* When the NAPI instance uses a timeout and keeps postponing - * it, we need to bound somehow the time packets are kept in - * the GRO layer - */ - napi_gro_flush(n, !!timeout); - } - gro_normal_list(n); + /* When the NAPI instance uses a timeout and keeps postponing + * it, we need to bound somehow the time packets are kept in + * the GRO layer + */ + gro_flush(&n->gro, !!timeout); + gro_normal_list(&n->gro); if (unlikely(!list_empty(&n->poll_list))) { /* If n->poll_list is not empty, we need to mask irqs */ @@ -5815,7 +5813,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) } if (timeout) - hrtimer_start(&n->timer, ns_to_ktime(timeout), + hrtimer_start(&n->gro.timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); return ret; } @@ -5839,19 +5837,17 @@ static struct napi_struct *napi_by_id(unsigned int napi_id) static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule) { if (!skip_schedule) { - gro_normal_list(napi); + gro_normal_list(&napi->gro); __napi_schedule(napi); return; } - if (napi->gro_bitmask) { - /* flush too old packets - * If HZ < 1000, flush all packets. - */ - napi_gro_flush(napi, HZ >= 1000); - } + /* flush too old packets + * If HZ < 1000, flush all packets. + */ + gro_flush(&napi->gro, HZ >= 1000); + gro_normal_list(&napi->gro); - gro_normal_list(napi); clear_bit(NAPI_STATE_SCHED, &napi->state); } @@ -5880,7 +5876,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs); timeout = READ_ONCE(napi->dev->gro_flush_timeout); if (napi->defer_hard_irqs_count && timeout) { - hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); + hrtimer_start(&napi->gro.timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); skip_schedule = true; } } @@ -5947,7 +5943,7 @@ void napi_busy_loop(unsigned int napi_id, } work = napi_poll(napi, budget); trace_napi_poll(napi, work, budget); - gro_normal_list(napi); + gro_normal_list(&napi->gro); count: if (work > 0) __NET_ADD_STATS(dev_net(napi->dev), @@ -6015,7 +6011,7 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) { struct napi_struct *napi; - napi = container_of(timer, struct napi_struct, timer); + napi = container_of(timer, struct napi_struct, gro.timer); /* Note : we use a relaxed variant of napi_schedule_prep() not setting * NAPI_STATE_MISSED, since we do not react to a device IRQ. @@ -6034,10 +6030,10 @@ static void init_gro_hash(struct napi_struct *napi) int i; for (i = 0; i < GRO_HASH_BUCKETS; i++) { - INIT_LIST_HEAD(&napi->gro_hash[i].list); - napi->gro_hash[i].count = 0; + INIT_LIST_HEAD(&napi->gro.hash[i].list); + napi->gro.hash[i].count = 0; } - napi->gro_bitmask = 0; + napi->gro.bitmask = 0; } int dev_set_threaded(struct net_device *dev, bool threaded) @@ -6109,12 +6105,12 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, INIT_LIST_HEAD(&napi->poll_list); INIT_HLIST_NODE(&napi->napi_hash_node); - hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); - napi->timer.function = napi_watchdog; + hrtimer_init(&napi->gro.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); + napi->gro.timer.function = napi_watchdog; init_gro_hash(napi); napi->skb = NULL; - INIT_LIST_HEAD(&napi->rx_list); - napi->rx_count = 0; + INIT_LIST_HEAD(&napi->gro.rx_list); + napi->gro.rx_count = 0; napi->poll = poll; if (weight > NAPI_POLL_WEIGHT) netdev_err_once(dev, "%s() called with weight %d\n", __func__, @@ -6159,7 +6155,7 @@ void napi_disable(struct napi_struct *n) break; } - hrtimer_cancel(&n->timer); + hrtimer_cancel(&n->gro.timer); clear_bit(NAPI_STATE_DISABLE, &n->state); } @@ -6194,9 +6190,9 @@ static void flush_gro_hash(struct napi_struct *napi) for (i = 0; i < GRO_HASH_BUCKETS; i++) { struct sk_buff *skb, *n; - list_for_each_entry_safe(skb, n, &napi->gro_hash[i].list, list) + list_for_each_entry_safe(skb, n, &napi->gro.hash[i].list, list) kfree_skb(skb); - napi->gro_hash[i].count = 0; + napi->gro.hash[i].count = 0; } } @@ -6211,7 +6207,7 @@ void __netif_napi_del(struct napi_struct *napi) napi_free_frags(napi); flush_gro_hash(napi); - napi->gro_bitmask = 0; + napi->gro.bitmask = 0; if (napi->thread) { kthread_stop(napi->thread); @@ -6268,14 +6264,11 @@ static int __napi_poll(struct napi_struct *n, bool *repoll) return work; } - if (n->gro_bitmask) { - /* flush too old packets - * If HZ < 1000, flush all packets. - */ - napi_gro_flush(n, HZ >= 1000); - } - - gro_normal_list(n); + /* flush too old packets + * If HZ < 1000, flush all packets. + */ + gro_flush(&n->gro, HZ >= 1000); + gro_normal_list(&n->gro); /* Some drivers may have called napi_schedule * prior to exhausting their budget. @@ -10396,7 +10389,7 @@ static struct hlist_head * __net_init netdev_create_hash(void) static int __net_init netdev_init(struct net *net) { BUILD_BUG_ON(GRO_HASH_BUCKETS > - 8 * sizeof_field(struct napi_struct, gro_bitmask)); + BITS_PER_BYTE * sizeof_field(struct gro_node, bitmask)); INIT_LIST_HEAD(&net->dev_base_head); diff --git a/net/core/gro.c b/net/core/gro.c index b4190eb08467..67fd587a87c9 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -278,8 +278,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) return 0; } - -static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb) +static void gro_complete(struct gro_node *gro, struct sk_buff *skb) { struct packet_offload *ptype; __be16 type = skb->protocol; @@ -312,43 +311,42 @@ static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb) } out: - gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count); + gro_normal_one(gro, skb, NAPI_GRO_CB(skb)->count); } -static void __napi_gro_flush_chain(struct napi_struct *napi, u32 index, - bool flush_old) +static void __gro_flush_chain(struct gro_node *gro, u32 index, bool flush_old) { - struct list_head *head = &napi->gro_hash[index].list; + struct list_head *head = &gro->hash[index].list; struct sk_buff *skb, *p; list_for_each_entry_safe_reverse(skb, p, head, list) { if (flush_old && NAPI_GRO_CB(skb)->age == jiffies) return; skb_list_del_init(skb); - napi_gro_complete(napi, skb); - napi->gro_hash[index].count--; + gro_complete(gro, skb); + gro->hash[index].count--; } - if (!napi->gro_hash[index].count) - __clear_bit(index, &napi->gro_bitmask); + if (!gro->hash[index].count) + __clear_bit(index, &gro->bitmask); } -/* napi->gro_hash[].list contains packets ordered by age. +/* gro->hash[].list contains packets ordered by age. * youngest packets at the head of it. * Complete skbs in reverse order to reduce latencies. */ -void napi_gro_flush(struct napi_struct *napi, bool flush_old) +void __gro_flush(struct gro_node *gro, bool flush_old) { - unsigned long bitmask = napi->gro_bitmask; + unsigned long bitmask = gro->bitmask; unsigned int i, base = ~0U; while ((i = ffs(bitmask)) != 0) { bitmask >>= i; base += i; - __napi_gro_flush_chain(napi, base, flush_old); + __gro_flush_chain(gro, base, flush_old); } } -EXPORT_SYMBOL(napi_gro_flush); +EXPORT_SYMBOL(__gro_flush); static void gro_list_prepare(const struct list_head *head, const struct sk_buff *skb) @@ -449,7 +447,7 @@ static void gro_pull_from_frag0(struct sk_buff *skb, int grow) } } -static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head) +static void gro_flush_oldest(struct gro_node *gro, struct list_head *head) { struct sk_buff *oldest; @@ -465,13 +463,14 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head) * SKB to the chain. */ skb_list_del_init(oldest); - napi_gro_complete(napi, oldest); + gro_complete(gro, oldest); } -static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb) +static enum gro_result dev_gro_receive(struct gro_node *gro, + struct sk_buff *skb) { u32 bucket = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1); - struct gro_list *gro_list = &napi->gro_hash[bucket]; + struct gro_list *gro_list = &gro->hash[bucket]; struct list_head *head = &offload_base; struct packet_offload *ptype; __be16 type = skb->protocol; @@ -530,7 +529,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff if (pp) { skb_list_del_init(pp); - napi_gro_complete(napi, pp); + gro_complete(gro, pp); gro_list->count--; } @@ -541,7 +540,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff goto normal; if (unlikely(gro_list->count >= MAX_GRO_SKBS)) - gro_flush_oldest(napi, &gro_list->list); + gro_flush_oldest(gro, &gro_list->list); else gro_list->count++; @@ -558,10 +557,10 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff gro_pull_from_frag0(skb, grow); ok: if (gro_list->count) { - if (!test_bit(bucket, &napi->gro_bitmask)) - __set_bit(bucket, &napi->gro_bitmask); - } else if (test_bit(bucket, &napi->gro_bitmask)) { - __clear_bit(bucket, &napi->gro_bitmask); + if (!test_bit(bucket, &gro->bitmask)) + __set_bit(bucket, &gro->bitmask); + } else if (test_bit(bucket, &gro->bitmask)) { + __clear_bit(bucket, &gro->bitmask); } return ret; @@ -599,13 +598,12 @@ struct packet_offload *gro_find_complete_by_type(__be16 type) } EXPORT_SYMBOL(gro_find_complete_by_type); -static gro_result_t napi_skb_finish(struct napi_struct *napi, - struct sk_buff *skb, - gro_result_t ret) +static gro_result_t gro_skb_finish(struct gro_node *gro, struct sk_buff *skb, + gro_result_t ret) { switch (ret) { case GRO_NORMAL: - gro_normal_one(napi, skb, 1); + gro_normal_one(gro, skb, 1); break; case GRO_MERGED_FREE: @@ -628,6 +626,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb) { + struct gro_node *gro = &napi->gro; gro_result_t ret; skb_mark_napi_id(skb, napi); @@ -635,7 +634,7 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb) skb_gro_reset_offset(skb, 0); - ret = napi_skb_finish(napi, skb, dev_gro_receive(napi, skb)); + ret = gro_skb_finish(gro, skb, dev_gro_receive(gro, skb)); trace_napi_gro_receive_exit(ret); return ret; @@ -695,7 +694,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi, __skb_push(skb, ETH_HLEN); skb->protocol = eth_type_trans(skb, skb->dev); if (ret == GRO_NORMAL) - gro_normal_one(napi, skb, 1); + gro_normal_one(&napi->gro, skb, 1); break; case GRO_MERGED_FREE: @@ -761,7 +760,7 @@ gro_result_t napi_gro_frags(struct napi_struct *napi) trace_napi_gro_frags_entry(skb); - ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb)); + ret = napi_frags_finish(napi, skb, dev_gro_receive(&napi->gro, skb)); trace_napi_gro_frags_exit(ret); return ret; From patchwork Tue Jun 28 19:47:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898882 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B0D4C43334 for ; Tue, 28 Jun 2022 19:54:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233335AbiF1Tyc (ORCPT ); Tue, 28 Jun 2022 15:54:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231727AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9A5B4F; Tue, 28 Jun 2022 12:49:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445782; x=1687981782; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lpTv4+c3cx/2/djGl4KefPjDjGKGmL+71fUeHlFr2sM=; b=A4V9r/GMuPynOtIJJef8ffIFcwdKYG1c1nfP0lt9IgxOCHiKbLNSpDp0 zFr5Cy66uaTFyeCjPeXceJGaKliERxKvf6Mr+llzbJ2Ww2K+aOsOaqW54 Hkp0DhZIvhdHTDUtwXO0GxI48lDCXAQF+i0Y2iB7HCAwNIhYcCsHuhjif J7qlSdIhLhpDqrx4lh+USsCzPDUBKUZKE3CNE8kuBE8eswrPR0QZxmT8T jS5oiezxCN9FbtRR+vqahKesKX1Ma2Wz6h6GtcNWkqKKUv36Fsm7eR/23 eLgYmRMCLc9F7PlYJtJlQuyWiRhEJqNNsU4uosHCZForskZUs5o+NnBnz Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523308" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523308" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="617303088" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga008.jf.intel.com with ESMTP; 28 Jun 2022 12:49:37 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9V022013; Tue, 28 Jun 2022 20:49:35 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 31/52] net, gro: expose some GRO API to use outside of NAPI Date: Tue, 28 Jun 2022 21:47:51 +0200 Message-Id: <20220628194812.1453059-32-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Make several functions global to be able to use GRO without a NAPI instance. This includes init, cleanup, receive functions, as well as a couple inlines to start and stop the deferred flush timer. Taking into account already global gro_flush(), it is now fully possible to maintain a GRO node without an aux NAPI entity. Signed-off-by: Alexander Lobakin --- include/net/gro.h | 18 +++++++++++++++ net/core/dev.c | 45 ++++++------------------------------- net/core/gro.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 38 deletions(-) diff --git a/include/net/gro.h b/include/net/gro.h index 75211ebd8765..539f931e736f 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -421,6 +421,7 @@ static inline __wsum ip6_gro_compute_pseudo(struct sk_buff *skb, int proto) } int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); +void gro_receive_skb_list(struct gro_node *gro, struct list_head *list); void __gro_flush(struct gro_node *gro, bool flush_old); static inline void gro_flush(struct gro_node *gro, bool flush_old) @@ -458,5 +459,22 @@ static inline void gro_normal_one(struct gro_node *gro, struct sk_buff *skb, gro_normal_list(gro); } +static inline void gro_timer_start(struct gro_node *gro, u64 timeout_ns) +{ + if (!timeout_ns) + return; + + hrtimer_start(&gro->timer, ns_to_ktime(timeout_ns), + HRTIMER_MODE_REL_PINNED); +} + +static inline void gro_timer_cancel(struct gro_node *gro) +{ + hrtimer_cancel(&gro->timer); +} + +void gro_init(struct gro_node *gro, + enum hrtimer_restart (*timer_cb)(struct hrtimer *timer)); +void gro_cleanup(struct gro_node *gro); #endif /* _NET_IPV6_GRO_H */ diff --git a/net/core/dev.c b/net/core/dev.c index 8b334aa974c2..62bf6ee00741 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5812,9 +5812,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done) return false; } - if (timeout) - hrtimer_start(&n->gro.timer, ns_to_ktime(timeout), - HRTIMER_MODE_REL_PINNED); + gro_timer_start(&n->gro, timeout); + return ret; } EXPORT_SYMBOL(napi_complete_done); @@ -5876,7 +5875,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs); timeout = READ_ONCE(napi->dev->gro_flush_timeout); if (napi->defer_hard_irqs_count && timeout) { - hrtimer_start(&napi->gro.timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); + gro_timer_start(&napi->gro, timeout); skip_schedule = true; } } @@ -6025,17 +6024,6 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) return HRTIMER_NORESTART; } -static void init_gro_hash(struct napi_struct *napi) -{ - int i; - - for (i = 0; i < GRO_HASH_BUCKETS; i++) { - INIT_LIST_HEAD(&napi->gro.hash[i].list); - napi->gro.hash[i].count = 0; - } - napi->gro.bitmask = 0; -} - int dev_set_threaded(struct net_device *dev, bool threaded) { struct napi_struct *napi; @@ -6105,12 +6093,8 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, INIT_LIST_HEAD(&napi->poll_list); INIT_HLIST_NODE(&napi->napi_hash_node); - hrtimer_init(&napi->gro.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); - napi->gro.timer.function = napi_watchdog; - init_gro_hash(napi); + gro_init(&napi->gro, napi_watchdog); napi->skb = NULL; - INIT_LIST_HEAD(&napi->gro.rx_list); - napi->gro.rx_count = 0; napi->poll = poll; if (weight > NAPI_POLL_WEIGHT) netdev_err_once(dev, "%s() called with weight %d\n", __func__, @@ -6155,8 +6139,7 @@ void napi_disable(struct napi_struct *n) break; } - hrtimer_cancel(&n->gro.timer); - + gro_timer_cancel(&n->gro); clear_bit(NAPI_STATE_DISABLE, &n->state); } EXPORT_SYMBOL(napi_disable); @@ -6183,19 +6166,6 @@ void napi_enable(struct napi_struct *n) } EXPORT_SYMBOL(napi_enable); -static void flush_gro_hash(struct napi_struct *napi) -{ - int i; - - for (i = 0; i < GRO_HASH_BUCKETS; i++) { - struct sk_buff *skb, *n; - - list_for_each_entry_safe(skb, n, &napi->gro.hash[i].list, list) - kfree_skb(skb); - napi->gro.hash[i].count = 0; - } -} - /* Must be called in process context */ void __netif_napi_del(struct napi_struct *napi) { @@ -6206,8 +6176,7 @@ void __netif_napi_del(struct napi_struct *napi) list_del_rcu(&napi->dev_list); napi_free_frags(napi); - flush_gro_hash(napi); - napi->gro.bitmask = 0; + gro_cleanup(&napi->gro); if (napi->thread) { kthread_stop(napi->thread); @@ -10627,7 +10596,7 @@ static int __init net_dev_init(void) INIT_CSD(&sd->defer_csd, trigger_rx_softirq, sd); spin_lock_init(&sd->defer_lock); - init_gro_hash(&sd->backlog); + gro_init(&sd->backlog.gro, NULL); sd->backlog.poll = process_backlog; sd->backlog.weight = weight_p; } diff --git a/net/core/gro.c b/net/core/gro.c index 67fd587a87c9..424c812abe79 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -624,6 +624,18 @@ static gro_result_t gro_skb_finish(struct gro_node *gro, struct sk_buff *skb, return ret; } +void gro_receive_skb_list(struct gro_node *gro, struct list_head *list) +{ + struct sk_buff *skb, *tmp; + + list_for_each_entry_safe(skb, tmp, list, list) { + skb_list_del_init(skb); + + skb_gro_reset_offset(skb, 0); + gro_skb_finish(gro, skb, dev_gro_receive(gro, skb)); + } +} + gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb) { struct gro_node *gro = &napi->gro; @@ -792,3 +804,48 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb) return sum; } EXPORT_SYMBOL(__skb_gro_checksum_complete); + +void gro_init(struct gro_node *gro, + enum hrtimer_restart (*timer_cb)(struct hrtimer *)) +{ + u32 i; + + for (i = 0; i < GRO_HASH_BUCKETS; i++) { + INIT_LIST_HEAD(&gro->hash[i].list); + gro->hash[i].count = 0; + } + + gro->bitmask = 0; + + INIT_LIST_HEAD(&gro->rx_list); + gro->rx_count = 0; + + if (!timer_cb) + return; + + hrtimer_init(&gro->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); + gro->timer.function = timer_cb; +} + +void gro_cleanup(struct gro_node *gro) +{ + struct sk_buff *skb, *n; + u32 i; + + gro_timer_cancel(gro); + memset(&gro->timer, 0, sizeof(gro->timer)); + + for (i = 0; i < GRO_HASH_BUCKETS; i++) { + list_for_each_entry_safe(skb, n, &gro->hash[i].list, list) + kfree_skb(skb); + + gro->hash[i].count = 0; + } + + gro->bitmask = 0; + + list_for_each_entry_safe(skb, n, &gro->rx_list, list) + kfree_skb(skb); + + gro->rx_count = 0; +} From patchwork Tue Jun 28 19:47:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898880 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2753CCA481 for ; Tue, 28 Jun 2022 19:54:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233738AbiF1Ty2 (ORCPT ); Tue, 28 Jun 2022 15:54:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232326AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4760BA5; Tue, 28 Jun 2022 12:49:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445783; x=1687981783; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=assH3g8cMJnpD/kb79aEpQaxf3/AadyGdz71JUQEZCc=; b=k9ItFV9wBsi5bb2jx6jBFlsBOt7P9s9pksSLrsfoKSgw6BiJoL1C+o2Z BfF5NEGomXvus5NlfzF27UK9FncQno2FHmxmZ0lFeAX0mpiovB8XAj1wU xoyKMadcPSFHwAB3nziGKhAUBqKr04RY/8eOHEs1rOh7SAIhNixApthj5 beMAi7c5N+ZCsxm+AAUll7kV59mAKXlT7xwfu7xZ2F9klch6hihqVqQGC V2ddAcWudk0PkXc1h6G5D4S1oPUk8Oy3iY3jVFGSVlnvpQrCOSWx/NWFx O8ldWxru8Cr10LwMTxY0c1CzddJaTNmJ5WrdySX0bTYn9fAXpnZvC3t5W Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="261635707" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="261635707" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288164" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:38 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9W022013; Tue, 28 Jun 2022 20:49:36 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 32/52] bpf, cpumap: switch to GRO from netif_receive_skb_list() Date: Tue, 28 Jun 2022 21:47:52 +0200 Message-Id: <20220628194812.1453059-33-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC cpumap has its own BH context based on kthread. It has a sane batch size of 8 frames per one cycle. GRO can be used on its own, adjust cpumap calls to the upper stack to use GRO API instead of netif_receive_skb_list() which processes skbs by batches, but doesn't involve GRO layer at all. It is most beneficial when a NIC which frame come from is XDP generic metadata-enabled, but in plenty of tests GRO performs better than listed receiving even given that it has to calculate full frame checksums on CPU. As GRO passes the skbs to the upper stack in the batches of @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the device where the frame comes from, it is enough to disable GRO netdev feature on it to completely restore the original behaviour: untouched frames will be being bulked and passed to the upper stack by 8, as it was with netif_receive_skb_list(). Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index f4860ac756cd..2d0edf8f6a05 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -29,8 +29,8 @@ #include #include -#include /* netif_receive_skb_list */ -#include /* eth_type_trans */ +#include +#include /* General idea: XDP packets getting XDP redirected to another CPU, * will maximum be stored/queued for one driver ->poll() call. It is @@ -67,6 +67,8 @@ struct bpf_cpu_map_entry { struct bpf_cpumap_val value; struct bpf_prog *prog; + struct gro_node gro; + atomic_t refcnt; /* Control when this struct can be free'ed */ struct rcu_head rcu; @@ -162,6 +164,7 @@ static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) if (atomic_dec_and_test(&rcpu->refcnt)) { if (rcpu->prog) bpf_prog_put(rcpu->prog); + gro_cleanup(&rcpu->gro); /* The queue should be empty at this point */ __cpu_map_ring_cleanup(rcpu->queue); ptr_ring_cleanup(rcpu->queue, NULL); @@ -295,6 +298,33 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, return nframes; } +static void cpu_map_gro_flush(struct bpf_cpu_map_entry *rcpu, + struct list_head *list) +{ + bool new = !list_empty(list); + + if (likely(new)) + gro_receive_skb_list(&rcpu->gro, list); + + if (rcpu->gro.bitmask) { + bool flush_old = HZ >= 1000; + + /* If the ring is not empty, there'll be a new iteration + * soon, and we only need to do a full flush if a tick is + * long (> 1 ms). + * If the ring is empty, to not hold GRO packets in the + * stack for too long, do a full flush. + * This is equivalent to how NAPI decides whether to perform + * a full flush (by batches of up to 64 frames tho). + */ + if (__ptr_ring_empty(rcpu->queue)) + flush_old = false; + + __gro_flush(&rcpu->gro, flush_old); + } + + gro_normal_list(&rcpu->gro); +} static int cpu_map_kthread_run(void *data) { @@ -384,7 +414,7 @@ static int cpu_map_kthread_run(void *data) list_add_tail(&skb->list, &list); } - netif_receive_skb_list(&list); + cpu_map_gro_flush(rcpu, &list); /* Feedback loop via tracepoint */ trace_xdp_cpumap_kthread(rcpu->map_id, n, kmem_alloc_drops, @@ -460,8 +490,10 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, rcpu->map_id = map->id; rcpu->value.qsize = value->qsize; + gro_init(&rcpu->gro, NULL); + if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd)) - goto free_ptr_ring; + goto free_gro; /* Setup kthread */ rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, @@ -482,7 +514,8 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, free_prog: if (rcpu->prog) bpf_prog_put(rcpu->prog); -free_ptr_ring: +free_gro: + gro_cleanup(&rcpu->gro); ptr_ring_cleanup(rcpu->queue, NULL); free_queue: kfree(rcpu->queue); From patchwork Tue Jun 28 19:47:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898862 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77FA0CCA47E for ; Tue, 28 Jun 2022 19:53:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230172AbiF1TxX (ORCPT ); Tue, 28 Jun 2022 15:53:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232016AbiF1Tu6 (ORCPT ); Tue, 28 Jun 2022 15:50:58 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5C7A2BB02; Tue, 28 Jun 2022 12:49:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445792; x=1687981792; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n5bnddshct4lzfMKnn45sHWwTzSqDqX5LzB68we3Q6Q=; b=Im4sDK6NiaLeCnl/W2PkC6o4jIf3WZC948fOzJVRf8CniDqR+Qbtj4p6 mKR3HpecQ37hErlyN23Jo920xlfxyFUDvhCmRfoOC1o2ZLOkM+8Yrjl+y 5opLdkwMAaGOA2R97jyp4wIyskFpIeqGj0PbcvxDViDjQsomWwRUlOh6A huDTw2yRhO3l09sagLEH72QqT9toNRXtihivghqOWQb+lyr8qOKxieYdi sowCWaLvFEGAfzZs/cBIm6fHtyH0uxobBIOuPSmm/Al20XPNg1WpcfKlM GHOxRy3oC8D5ILgcml+9Gt1Q+r3qC08ML0XoJvKnHAiFGzsY9uocKKwOK w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="281869648" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="281869648" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809505" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:39 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9X022013; Tue, 28 Jun 2022 20:49:38 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 33/52] bpf, cpumap: add option to set a timeout for deferred flush Date: Tue, 28 Jun 2022 21:47:53 +0200 Message-Id: <20220628194812.1453059-34-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC GRO efficiency depends a lot on the batch size. With the size of 8, it is less efficient than e.g. with NAPI and the size of 64. To do less percentage of full flushes and not hold GRO packets for too long, use the GRO hrtimer to wake up the kthread even if there's no new frames in the ptr_ring. Its value is being passed from the user side inside the corresponding &bpf_cpumap_val on map creation, in nanoseconds. When the timeout is 0/unset, the behaviour is the same as it was prior to the change. Signed-off-by: Alexander Lobakin --- include/uapi/linux/bpf.h | 1 + kernel/bpf/cpumap.c | 39 +++++++++++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 36 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1caaec1de625..097719ee2172 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5989,6 +5989,7 @@ struct bpf_cpumap_val { int fd; /* prog fd on map write */ __u32 id; /* prog id on map read */ } bpf_prog; + __u64 timeout; /* timeout to wait for new packets, in ns */ }; enum sk_action { diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 2d0edf8f6a05..145f49de0931 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -95,7 +95,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) /* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || (value_size != offsetofend(struct bpf_cpumap_val, qsize) && - value_size != offsetofend(struct bpf_cpumap_val, bpf_prog.fd)) || + value_size != offsetofend(struct bpf_cpumap_val, bpf_prog.fd) && + value_size != offsetofend(struct bpf_cpumap_val, timeout)) || attr->map_flags & ~BPF_F_NUMA_NODE) return ERR_PTR(-EINVAL); @@ -312,18 +313,42 @@ static void cpu_map_gro_flush(struct bpf_cpu_map_entry *rcpu, /* If the ring is not empty, there'll be a new iteration * soon, and we only need to do a full flush if a tick is * long (> 1 ms). - * If the ring is empty, to not hold GRO packets in the - * stack for too long, do a full flush. + * If the ring is empty, and there were some new packets + * processed, either do a partial flush and spin up a timer + * to flush the rest if the timeout is set, or do a full + * flush otherwise. + * No new packets with non-zero gro_bitmask can mean that we + * probably came from the timer call and/or there's [almost] + * no activity here right now. To not hold GRO packets in + * the stack for too long, do a full flush. * This is equivalent to how NAPI decides whether to perform * a full flush (by batches of up to 64 frames tho). */ if (__ptr_ring_empty(rcpu->queue)) - flush_old = false; + flush_old = new ? !!rcpu->value.timeout : false; __gro_flush(&rcpu->gro, flush_old); } gro_normal_list(&rcpu->gro); + + /* Non-zero gro_bitmask at this point means that we have some packets + * held in the GRO engine after a partial flush. If we have a timeout + * set up, and there are no signs of a new kthread iteration, launch + * a timer to flush them as well. + */ + if (rcpu->gro.bitmask && __ptr_ring_empty(rcpu->queue)) + gro_timer_start(&rcpu->gro, rcpu->value.timeout); +} + +static enum hrtimer_restart cpu_map_gro_watchdog(struct hrtimer *timer) +{ + const struct bpf_cpu_map_entry *rcpu; + + rcpu = container_of(timer, typeof(*rcpu), gro.timer); + wake_up_process(rcpu->kthread); + + return HRTIMER_NORESTART; } static int cpu_map_kthread_run(void *data) @@ -489,8 +514,9 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, rcpu->cpu = cpu; rcpu->map_id = map->id; rcpu->value.qsize = value->qsize; + rcpu->value.timeout = value->timeout; - gro_init(&rcpu->gro, NULL); + gro_init(&rcpu->gro, cpu_map_gro_watchdog); if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd)) goto free_gro; @@ -606,6 +632,9 @@ static int cpu_map_update_elem(struct bpf_map *map, void *key, void *value, return -EEXIST; if (unlikely(cpumap_value.qsize > 16384)) /* sanity limit on qsize */ return -EOVERFLOW; + /* Don't allow timeout longer than 1 ms -- 1 tick on HZ == 1000 */ + if (unlikely(cpumap_value.timeout > 1 * NSEC_PER_MSEC)) + return -ERANGE; /* Make sure CPU is a valid possible cpu */ if (key_cpu >= nr_cpumask_bits || !cpu_possible(key_cpu)) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 436b925adfb3..a3579cdb0225 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5989,6 +5989,7 @@ struct bpf_cpumap_val { int fd; /* prog fd on map write */ __u32 id; /* prog id on map read */ } bpf_prog; + __u64 timeout; /* timeout to wait for new packets, in ns */ }; enum sk_action { From patchwork Tue Jun 28 19:47:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898859 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B907C433EF for ; Tue, 28 Jun 2022 19:53:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232719AbiF1TxM (ORCPT ); Tue, 28 Jun 2022 15:53:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232315AbiF1Tuz (ORCPT ); Tue, 28 Jun 2022 15:50:55 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEFA119C; Tue, 28 Jun 2022 12:49:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445785; x=1687981785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dB3Xt0F7hnoEevA2MD5Np5ujq3lddankoDvz3+hjrx0=; b=iSn6uvFR89n1JgcBC1hseJZ329DuFvQWAUot+VznP3YmfTR1PkfHk61P P4UwYgnoBR8ePymGE+SD8g7XSq5W1binORdIpY/D4G2F4xMWoNFq+Rbiu mdP3XZS6ylQn/yiFNmazGONhia/ggq2RGCrXe8U78DsccnipcODICni3Y XRr88Ird77AQfPBi8hEcAZdDoIut/Afz+DrRB4z6636xJacfCvE2XMI1b SW3V3z8dzFt+LvtunfRzLpMBX3HoMm1nuX61Ia6rQCkvH+dVW4mtuFUZ+ HHVGBud5Q3cmexnmPgaWNH/B1dSp000BR9JGb4mIYSymqGCeNEdqGOLbB w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523322" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523322" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="717555210" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga004.jf.intel.com with ESMTP; 28 Jun 2022 12:49:41 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9Y022013; Tue, 28 Jun 2022 20:49:39 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 34/52] samples/bpf: add 'timeout' option to xdp_redirect_cpu Date: Tue, 28 Jun 2022 21:47:54 +0200 Message-Id: <20220628194812.1453059-35-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add ability to specify a deferred flush timeout (in usec, not nsec!) when setting up a cpumap in xdp_redirect_cpu sample. Signed-off-by: Alexander Lobakin --- samples/bpf/xdp_redirect_cpu_user.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c index ca457c34eb0f..d184c3fcab53 100644 --- a/samples/bpf/xdp_redirect_cpu_user.c +++ b/samples/bpf/xdp_redirect_cpu_user.c @@ -34,6 +34,8 @@ static const char *__doc__ = #include "xdp_sample_user.h" #include "xdp_redirect_cpu.skel.h" +#define NSEC_PER_USEC 1000UL + static int map_fd; static int avail_fd; static int count_fd; @@ -61,6 +63,7 @@ static const struct option long_options[] = { { "redirect-device", required_argument, NULL, 'r' }, { "redirect-map", required_argument, NULL, 'm' }, { "meta-thresh", optional_argument, NULL, 'M' }, + { "timeout", required_argument, NULL, 't'}, {} }; @@ -128,9 +131,10 @@ static int create_cpu_entry(__u32 cpu, struct bpf_cpumap_val *value, } } - printf("%s CPU: %u as idx: %u qsize: %d cpumap_prog_fd: %d (cpus_count: %u)\n", + printf("%s CPU: %u as idx: %u qsize: %d timeout: %llu cpumap_prog_fd: %d (cpus_count: %u)\n", new ? "Add new" : "Replace", cpu, avail_idx, - value->qsize, value->bpf_prog.fd, curr_cpus_count); + value->qsize, value->timeout, value->bpf_prog.fd, + curr_cpus_count); return 0; } @@ -346,6 +350,7 @@ int main(int argc, char **argv) * tuned-adm profile network-latency */ qsize = 2048; + value.timeout = 0; /* Defaults to 0 to mimic the previous behaviour. */ skel = xdp_redirect_cpu__open(); if (!skel) { @@ -383,7 +388,7 @@ int main(int argc, char **argv) } prog = skel->progs.xdp_prognum5_lb_hash_ip_pairs; - while ((opt = getopt_long(argc, argv, "d:si:Sxp:f:e:r:m:c:q:FMvh", + while ((opt = getopt_long(argc, argv, "d:si:Sxp:f:e:r:m:c:q:FMt:vh", long_options, &longindex)) != -1) { switch (opt) { case 'd': @@ -466,6 +471,10 @@ int main(int argc, char **argv) opts.meta_thresh = optarg ? strtoul(optarg, NULL, 0) : 1; break; + case 't': + value.timeout = strtoull(optarg, NULL, 0) * + NSEC_PER_USEC; + break; case 'h': error = false; default: From patchwork Tue Jun 28 19:47:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898881 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94068CCA47E for ; Tue, 28 Jun 2022 19:54:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233627AbiF1Ty0 (ORCPT ); Tue, 28 Jun 2022 15:54:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231496AbiF1Tu4 (ORCPT ); Tue, 28 Jun 2022 15:50:56 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0251C62; Tue, 28 Jun 2022 12:49:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445786; x=1687981786; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zVh8rnbRSL6yxYm7j3MFZi5+oKCgLy0ivUDvveHIa48=; b=f2mOXHr1NtwdvSEGGpKwL2vuiVs+AXEzzrqeKg7DFtjzoX/KmuJfQVN4 FH43XfU3W3w6vDFrGbRH60fhLuUN8KdQPHiawSINVur6itITi/O6ITM2T SZZKHXr4M/8HyugzqOOufwlvN0YmksqN/woK5BWxFB1fxNBFyztx0wSJr V8asIJ5DaP9qsczjI9/KvBO3ZtfNFBEzsVa9nEEPn8ItOLZgfGtGhmbfQ zJFOAQaV7IKrFEVnRusz9jpOIE0fy+Lxr1lAYulvE+u55z7eELFsfigbI W+9U9zw1Hsom7HVOyFkeWsLr5FplJnqOG+lFqsvn5gHtEE8/eEMNaUhwV g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280595962" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280595962" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="732883439" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga001.fm.intel.com with ESMTP; 28 Jun 2022 12:49:42 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9Z022013; Tue, 28 Jun 2022 20:49:40 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 35/52] net, skbuff: introduce napi_skb_cache_get_bulk() Date: Tue, 28 Jun 2022 21:47:55 +0200 Message-Id: <20220628194812.1453059-36-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if and as less as needed. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. If the function is not able to provide the requested number of entries due to an allocation error, it returns as much as it got. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 43 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0a95f753c1d9..0c1e5446653b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1240,6 +1240,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, void skb_attempt_defer_free(struct sk_buff *skb); struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); +size_t napi_skb_cache_get_bulk(void **skbs, size_t n); /** * alloc_skb - allocate a network buffer diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b23fc7f1157..9b075f52d1fb 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -190,6 +190,49 @@ static struct sk_buff *napi_skb_cache_get(void) return skb; } +/** + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache + * @skbs: a pointer to an at least @n-sized array to fill with skb pointers + * @n: the number of entries to provide + * + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes + * the pointers into the provided array @skbs. If there are less entries + * available, bulk-allocates the diff from the MM layer. + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are + * ready for {,__}build_skb_around() and don't have any data buffers attached. + * Must be called *only* from the BH context. + * + * Returns the number of successfully allocated skbs (@n if + * kmem_cache_alloc_bulk() didn't fail). + */ +size_t napi_skb_cache_get_bulk(void **skbs, size_t n) +{ + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + size_t total = n; + + if (nc->skb_count < n) + n -= kmem_cache_alloc_bulk(skbuff_head_cache, + GFP_ATOMIC | __GFP_ZERO, + n - nc->skb_count, + skbs + nc->skb_count); + if (unlikely(nc->skb_count < n)) { + total -= n - nc->skb_count; + n = nc->skb_count; + } + + for (size_t i = 0; i < n; i++) { + skbs[i] = nc->skb_cache[nc->skb_count - n + i]; + + kasan_unpoison_object_data(skbuff_head_cache, skbs[i]); + memset(skbs[i], 0, offsetof(struct sk_buff, tail)); + } + + nc->skb_count -= n; + + return total; +} +EXPORT_SYMBOL_GPL(napi_skb_cache_get_bulk); + /* Caller must provide SKB that is memset cleared */ static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) From patchwork Tue Jun 28 19:47:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898887 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 984C0C433EF for ; Tue, 28 Jun 2022 19:54:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233771AbiF1Tyl (ORCPT ); Tue, 28 Jun 2022 15:54:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232145AbiF1Tu5 (ORCPT ); Tue, 28 Jun 2022 15:50:57 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6C27101D5; Tue, 28 Jun 2022 12:49:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445788; x=1687981788; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f++W//QJlwHfgm1DNYa0qsrg7yVxZsil1tIeQfMWplE=; b=AfngGRlqtLpC26acAHuYJlvLDUCzpYtJFlIVJ3OpNBzqU1GAhGZOWofQ xKodbPyGv1lHZzKBTylTQaRxe3jZzTCVU7TrwFu7fIOfXmqBVp5xNAGiG 2B3mwNg+VMPj5dmEyro/+9Do9pu9fKSQtn2r/Tfv3S7tSPGt5OY7I+FwN eFK+xR81T1/fU28tRQx/CLEb8Ubdf4vmF7kWCDdBkdLCua7oriLGRp1si RFuuNH9dGcXL3gxC1/GHJS60PcN5SR7XsmujSnmMKlUqOFV//HEpZjDLM 8ot3BKOZTTf0pZ4aKWldmljbiJURTeeyCHNRHZSH81NC4VaxGhgz9sfza g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="264874207" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="264874207" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="565181352" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga006.jf.intel.com with ESMTP; 28 Jun 2022 12:49:43 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9a022013; Tue, 28 Jun 2022 20:49:41 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 36/52] bpf, cpumap: switch to napi_skb_cache_get_bulk() Date: Tue, 28 Jun 2022 21:47:56 +0200 Message-Id: <20220628194812.1453059-37-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Now that cpumap uses GRO, which drops unused skb heads to the NAPI cache, use napi_skb_cache_get_bulk() to try to reuse cached entries and lower the MM layer pressure. In the situation when all 8 skbs from one cpumap batch goes into one GRO skb (so the rest 7 go into the cache), there will now be only 1 skb to allocate per cycle instead of 8. If there is some other work happening in between the cycles, even all 8 might be getting decached each cycle. This makes the BH-off period per each batch slightly longer -- previously, skb allocation was happening in the process context. Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 145f49de0931..1bb3ae570e6c 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -365,7 +365,6 @@ static int cpu_map_kthread_run(void *data) while (!kthread_should_stop() || !__ptr_ring_empty(rcpu->queue)) { struct xdp_cpumap_stats stats = {}; /* zero stats */ unsigned int kmem_alloc_drops = 0, sched = 0; - gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n, m, nframes, xdp_n; void *frames[CPUMAP_BATCH]; void *skbs[CPUMAP_BATCH]; @@ -416,8 +415,10 @@ static int cpu_map_kthread_run(void *data) /* Support running another XDP prog on this CPU */ nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, &stats, &list); + local_bh_disable(); + if (nframes) { - m = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, skbs); + m = napi_skb_cache_get_bulk(skbs, nframes); if (unlikely(m == 0)) { for (i = 0; i < nframes; i++) skbs[i] = NULL; /* effect: xdp_return_frame */ @@ -425,7 +426,6 @@ static int cpu_map_kthread_run(void *data) } } - local_bh_disable(); for (i = 0; i < nframes; i++) { struct xdp_frame *xdpf = frames[i]; struct sk_buff *skb = skbs[i]; From patchwork Tue Jun 28 19:47:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898885 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B502CCA481 for ; Tue, 28 Jun 2022 19:54:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233754AbiF1Tyi (ORCPT ); Tue, 28 Jun 2022 15:54:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232430AbiF1Tu4 (ORCPT ); Tue, 28 Jun 2022 15:50:56 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C8C013DC8; Tue, 28 Jun 2022 12:49:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445789; x=1687981789; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1xrvhIX2IWhOahXdK1kR3UK2m56gKZ4b7bS+Cp3UyoY=; b=UNFyog5gECIDIzKB4WbDHf9yKQDb9XjLLsdTPouQSnCuIsdlBU7D99DB pUT+u/qvn55h8k2lV5ntKVTM+EjB6FZCp0VYkmCN+0OSx7X3Xefs7xxcU Cz1qddahTEdaPbKo2DrBjIx/bm374ZBVwNA1c4I1CDOxirPF6RfVts5Mw 1IbGnXv+0aTNPvc1+4ArAHGxil2ozCRW58g4Cka4SaiseFeIy83bpf8mf lxR/WN3G7LokNruh7oPEFB6LvRn0G3jt/CLdeswAqPftB5bo8iQScUZpJ mSIMMtFh14lqkZepDeLPoUC9PBaPxDBNKS8Onsow/QPBEsAnhoW7ya1uj w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="262242974" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="262242974" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809521" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:45 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9b022013; Tue, 28 Jun 2022 20:49:43 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 37/52] rcupdate: fix access helpers for incomplete struct pointers on GCC < 10 Date: Tue, 28 Jun 2022 21:47:57 +0200 Message-Id: <20220628194812.1453059-38-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC It's been found that currently it is impossible to use RCU for incomplete struct pointers. RCU access helpers have the following construct: typeof(*p) *local = ... GCC versions older than 10 don't look at the whole sentence and believe that there's a dereference happening inside the typeof(), although it does not. As RCU doesn't imply any dereference, but only the way to store and access pointers, this is not a valid case. Moreover, Clang and GCC 10 onwards evaluate it with no issues. Fix this by introducing a new macro, __rcutype(), which will take care of pointer annotations inside the RCU access helpers, in two different ways depending on the compiler used. For sane compilers, leave it as it is for now, as it ensures that the passed argument is a pointer, and for the affected ones use... `typeof(0 ? (p) : (p))`. As: void fc(void) { } ... pr_info("%d", __builtin_types_compatible(typeof(*fn) *, typeof(fn))); pr_info("%d", __builtin_types_compatible(typeof(*fn) *, typeof(&fn))); pr_info("%d", __builtin_types_compatible(typeof(*fn) *, typeof(0 ? (fn) : (fn))); emits: 011 and we can't use the second for non-functions. Fixes: ca5ecddfa8fc ("rcu: define __rcu address space modifier for sparse") Signed-off-by: Alexander Lobakin --- include/linux/rcupdate.h | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1a32036c918c..f5971fccf852 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -358,18 +358,33 @@ static inline void rcu_preempt_sleep_check(void) { } * (e.g., __srcu), should this make sense in the future. */ +/* + * Unfortunately, GCC versions older than 10 don't look at the whole sentence + * and treat `typeof(*(p)) *` as dereferencing although it is not. This makes + * it impossible to use those helpers with pointers to incomplete structures. + * Plain `typeof(p)` is not the same, as `typeof(func)` returns the type of a + * function, not a pointer to it, as `typeof(*(func)) *` does. + * `typeof( ? (func) : (func))` is silly; however, it works just as + * the original definition. + */ +#if defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 100000 +#define __rcutype(p, ...) typeof(0 ? (p) : (p)) __VA_ARGS__ +#else +#define __rcutype(p, ...) typeof(*(p)) __VA_ARGS__ * +#endif + #ifdef __CHECKER__ #define rcu_check_sparse(p, space) \ - ((void)(((typeof(*p) space *)p) == p)) + ((void)((__rcutype(p, space))(p) == (p))) #else /* #ifdef __CHECKER__ */ #define rcu_check_sparse(p, space) #endif /* #else #ifdef __CHECKER__ */ #define __unrcu_pointer(p, local) \ ({ \ - typeof(*p) *local = (typeof(*p) *__force)(p); \ + __rcutype(p) local = (__rcutype(p, __force))(p); \ rcu_check_sparse(p, __rcu); \ - ((typeof(*p) __force __kernel *)(local)); \ + ((__rcutype(p, __force __kernel))(local)); \ }) /** * unrcu_pointer - mark a pointer as not being RCU protected @@ -382,29 +397,29 @@ static inline void rcu_preempt_sleep_check(void) { } #define __rcu_access_pointer(p, local, space) \ ({ \ - typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \ + __rcutype(p) local = (__rcutype(p, __force))READ_ONCE(p); \ rcu_check_sparse(p, space); \ - ((typeof(*p) __force __kernel *)(local)); \ + ((__rcutype(p, __force __kernel))(local)); \ }) #define __rcu_dereference_check(p, local, c, space) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \ + __rcutype(p) local = (__rcutype(p, __force))READ_ONCE(p); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ rcu_check_sparse(p, space); \ - ((typeof(*p) __force __kernel *)(local)); \ + ((__rcutype(p, __force __kernel))(local)); \ }) #define __rcu_dereference_protected(p, local, c, space) \ ({ \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_protected() usage"); \ rcu_check_sparse(p, space); \ - ((typeof(*p) __force __kernel *)(p)); \ + ((__rcutype(p, __force __kernel))(p)); \ }) #define __rcu_dereference_raw(p, local) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(p) local = READ_ONCE(p); \ - ((typeof(*p) __force __kernel *)(local)); \ + __rcutype(p) local = READ_ONCE(p); \ + ((__rcutype(p, __force __kernel))(local)); \ }) #define rcu_dereference_raw(p) __rcu_dereference_raw(p, __UNIQUE_ID(rcu)) @@ -412,7 +427,7 @@ static inline void rcu_preempt_sleep_check(void) { } * RCU_INITIALIZER() - statically initialize an RCU-protected global variable * @v: The value to statically initialize with. */ -#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v) +#define RCU_INITIALIZER(v) (__rcutype(v, __force __rcu))(v) /** * rcu_assign_pointer() - assign to RCU-protected pointer From patchwork Tue Jun 28 19:47:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898860 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7EC8C433EF for ; Tue, 28 Jun 2022 19:53:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231842AbiF1TxO (ORCPT ); Tue, 28 Jun 2022 15:53:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232467AbiF1Tu5 (ORCPT ); Tue, 28 Jun 2022 15:50:57 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96CC82B268; Tue, 28 Jun 2022 12:49:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445791; x=1687981791; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=42auUlJ9mEBV1h64OGRghrHR1EB9VY/2pp7kr00mF34=; b=jWtnNgFIhMALkWQWS2TQ6Li6D5A08upb/zA6uNPPzE0LdO+8Onpyxngu +voDegBTk9VTZ+yM+zl12icuHzvQDtXMW3wplNRvPhIhRsv70VYOkueiJ uhwKmxq+JxLThLSs3mObz5LMFTkz/L14NiyKLSv/Vp/jsqTpaQ1PrXcyb VmcaGY+SEYRLXSHG9kyT2xV8JiQpe1Ny+h4EdqH8pkYj+WhBAtLzEvTPK LBt9e2jfn5gbb3PgZapmYqmRNxo/KqXddfGvTSOWWyGhr/YMhioUtlkox N8Vic0QkeNWwRFJrIW5NxeKmXTvrODlHFauaCmfOB82ms5E1rUWJr5ZRT w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280595981" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280595981" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="587988569" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga007.jf.intel.com with ESMTP; 28 Jun 2022 12:49:46 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9c022013; Tue, 28 Jun 2022 20:49:44 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 38/52] net, xdp: remove unused xdp_attachment_info::flags Date: Tue, 28 Jun 2022 21:47:58 +0200 Message-Id: <20220628194812.1453059-39-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Since %XDP_QUERY_PROG was removed, the ::flags field is not used anymore. It's being written by xdp_attachment_setup(), but never read. Remove it. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 1 - net/bpf/core.c | 1 - 2 files changed, 2 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 1663d0b3a05a..d1fd809655be 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -382,7 +382,6 @@ struct xdp_attachment_info { struct bpf_prog *prog; u64 btf_id; u32 meta_thresh; - u32 flags; }; struct netdev_bpf; diff --git a/net/bpf/core.c b/net/bpf/core.c index d2d01b8e6441..65f25019493d 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -554,7 +554,6 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, info->prog = bpf->prog; info->btf_id = bpf->btf_id; info->meta_thresh = bpf->meta_thresh; - info->flags = bpf->flags; } EXPORT_SYMBOL_GPL(xdp_attachment_setup); From patchwork Tue Jun 28 19:47:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898863 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA0CFCCA479 for ; Tue, 28 Jun 2022 19:53:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232124AbiF1Tx0 (ORCPT ); Tue, 28 Jun 2022 15:53:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232487AbiF1Tu7 (ORCPT ); Tue, 28 Jun 2022 15:50:59 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFB8E2DD78; Tue, 28 Jun 2022 12:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445796; x=1687981796; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ht3UfTVIR3bDx+pf/6oyHp02PGIIQJZ5QLXufKfdPmY=; b=IfKOPtkyuqHWDCMOUAHZUPZrJUwZHn089HW50CTQ6lgSkjU6uTeoEimT 7b6ahgE79FxFAGVq13uGEyJqfN5QvTA3md4TQUboSIg/QJ9wu/QP205Vm M6HsD+zUeLUSOB5vyx/Nlbwp58aLCkGa3hQ04FGOKD+luP6Pq+q9X0AD5 y+jtZvE8mVvrnw5zfRyVgWWc8I1Tu1OB3XhC8IyJjRg4M9w/B/YyBOaMy htIblI4JbrfK9PdmGSMEUe8tdao8AwnS/uEf0QHX+7WasXQiljHPR+Dbt XnnqlJIy6Uena6/vrm/anv/Hfh4Zq68krycShv3mGnQrErg6ZvGg0ce7Z w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523339" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523339" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="658257552" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga004.fm.intel.com with ESMTP; 28 Jun 2022 12:49:47 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9d022013; Tue, 28 Jun 2022 20:49:45 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 39/52] net, xdp: make &xdp_attachment_info a bit more useful in drivers Date: Tue, 28 Jun 2022 21:47:59 +0200 Message-Id: <20220628194812.1453059-40-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a new field which will store an arbitrary 'driver cookie', the closest usage is to store enum there corresponding to the metadata types supported by a driver to shortcut them on hotpath. In fact, it's just reusing the 4-byte padding at the end. Also, make it possible to store BTF ID in LE rather than CPU byteorder, so that drivers could save some cycles on [potential] byteswapping on hotpath. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index d1fd809655be..5762ce18885f 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -380,8 +380,12 @@ void xdp_unreg_mem_model(struct xdp_mem_info *mem); struct xdp_attachment_info { struct bpf_prog *prog; - u64 btf_id; + union { + __le64 btf_id_le; + u64 btf_id; + }; u32 meta_thresh; + u32 drv_cookie; }; struct netdev_bpf; From patchwork Tue Jun 28 19:48:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898866 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12A8FCCA479 for ; Tue, 28 Jun 2022 19:53:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232943AbiF1Txa (ORCPT ); Tue, 28 Jun 2022 15:53:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232141AbiF1Tu6 (ORCPT ); Tue, 28 Jun 2022 15:50:58 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F4952C11D; Tue, 28 Jun 2022 12:49:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445793; x=1687981793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7sizNGB2psf+Qq79FGgS+INcdvERhbHoOrovj1ZDfPY=; b=M/JQpYwzZJwJYLF9Tj5DbVX/0AwPNHPJB73ZyBrBDIdwyHCe49XBwpR5 NgPfT55GO0dHoPYWA8eIO1XipcSEop4RGgEbj90jSUDC/j8AtQye8Kz3a yRRIJ2lX096Zwl9nkc/d1dI12k8xMX9U2NrQXg7CaAVKQPQ4OpVTactsB ECbtAaZa90vepE9sn+IYWrliZrLyBN57h39Mg5Mf0xZmSxInI6ySCFi+P DjRE2AsqqU26JByo0Mseb8piFLX0OEiZ8QMN5eIskMR1KfNPjoHWShBTg +YGPx9FztLnCnYbwpD05jmNx6IVx7oGj+oeiyDGWnwu6lyF8z9Mx3wpql w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="262242993" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="262242993" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809532" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:49 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9e022013; Tue, 28 Jun 2022 20:49:47 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 40/52] net, xdp: add an RCU version of xdp_attachment_setup() Date: Tue, 28 Jun 2022 21:48:00 +0200 Message-Id: <20220628194812.1453059-41-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Currently, xdp_attachment_setup() uses plain assignments and puts the previous BPF program before updating the pointer, rendering itself dangerous for program hot-swaps due to pointer tearing and potential use-after-free's. At the same time, &xdp_attachment_info comes handy to use it in drivers as a main container including hotpath -- the BTF ID and meta threshold values are now being used there as well, not speaking of reducing some boilerplate code. Add an RCU-protected pointer to XDP program to that structure and an RCU version of xdp_attachment_setup(), which will make sure that all the values were not corrupted and that old BPF program was freed only after the pointer was updated. The only thing left is that RCU read critical sections might happen in between each assignment, but since the relations between XDP prog, BTF ID and meta threshold are not vital, it's totally fine to allow this. A caller must ensure it's being executed under the RTNL lock. Reader sides must ensure they're being executed under the RCU read lock. Once all the current users of xdp_attachment_setup() are switched to the RCU-aware version (with appropriate adjustments), the "regular" one will be removed. Partially inspired by commit fe45386a2082 ("net/mlx5e: Use RCU to protect rq->xdp_prog"). Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 7 ++++++- net/bpf/core.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 5762ce18885f..49e562e4fcca 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -379,7 +379,10 @@ int xdp_reg_mem_model(struct xdp_mem_info *mem, void xdp_unreg_mem_model(struct xdp_mem_info *mem); struct xdp_attachment_info { - struct bpf_prog *prog; + union { + struct bpf_prog __rcu *prog_rcu; + struct bpf_prog *prog; + }; union { __le64 btf_id_le; u64 btf_id; @@ -391,6 +394,8 @@ struct xdp_attachment_info { struct netdev_bpf; void xdp_attachment_setup(struct xdp_attachment_info *info, struct netdev_bpf *bpf); +void xdp_attachment_setup_rcu(struct xdp_attachment_info *info, + struct netdev_bpf *bpf); #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE diff --git a/net/bpf/core.c b/net/bpf/core.c index 65f25019493d..d444d0555057 100644 --- a/net/bpf/core.c +++ b/net/bpf/core.c @@ -557,6 +557,34 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, } EXPORT_SYMBOL_GPL(xdp_attachment_setup); +/** + * xdp_attachment_setup_rcu - an RCU-powered version of xdp_attachment_setup() + * @info: pointer to the target container + * @bpf: pointer to the container passed to ::ndo_bpf() + * + * Protects sensitive values with RCU to allow program how-swaps without + * stopping an interface. Write side (this) must be called under the RTNL lock + * and reader sides must fetch any data only under the RCU read lock -- old BPF + * program will be freed only after a critical section is finished (see + * bpf_prog_put()). + */ +void xdp_attachment_setup_rcu(struct xdp_attachment_info *info, + struct netdev_bpf *bpf) +{ + struct bpf_prog *old_prog; + + ASSERT_RTNL(); + + old_prog = rcu_replace_pointer(info->prog_rcu, bpf->prog, + lockdep_rtnl_is_held()); + WRITE_ONCE(info->btf_id, bpf->btf_id); + WRITE_ONCE(info->meta_thresh, bpf->meta_thresh); + + if (old_prog) + bpf_prog_put(old_prog); +} +EXPORT_SYMBOL_GPL(xdp_attachment_setup_rcu); + struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp) { unsigned int metasize, totsize; From patchwork Tue Jun 28 19:48:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898878 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3549C433EF for ; Tue, 28 Jun 2022 19:54:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233517AbiF1TyK (ORCPT ); Tue, 28 Jun 2022 15:54:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232805AbiF1TvW (ORCPT ); Tue, 28 Jun 2022 15:51:22 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79CD623B; Tue, 28 Jun 2022 12:50:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445811; x=1687981811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ufgJOb+TezFSxWKDyYwF8u0ehQj1wQzhK8T9FWSr3eU=; b=LfXKIyab3d+L51y9sjQI6XC/tWQnm16jbae18+N3+TXqh/OCcAn63WRW dCoXDn5GYqmrQltNAKqKIzsaoralX89cQuFoZ03EnSqyo2mflyw9pXEl0 /MW5IT8bVpgGSadP6EUXoo2ZNa37P0j1IrKDNg4oznvaRMC1gKElXfyw0 k+HkXRkHhEAeKhxcuuQAefVDL33IqSXwvkHXkW5O68eI+Nkwx2IuHRH6V 0S6rzekNxy+9L0wf4VBZXoGMhUzdaI/9VjRdPFUdSawuZH/Ep12jnThfG apHzKTjWRtuqqg6ZlnARaiDEI82AR7j1XQOwDJBsj6gGOAsZhwBMTV70H g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="368147011" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="368147011" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="617303166" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga008.jf.intel.com with ESMTP; 28 Jun 2022 12:49:50 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9f022013; Tue, 28 Jun 2022 20:49:48 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 41/52] net, xdp: replace net_device::xdp_prog pointer with &xdp_attachment_info Date: Tue, 28 Jun 2022 21:48:01 +0200 Message-Id: <20220628194812.1453059-42-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC To have access and store not only BPF prog pointer, but also auxiliary params on Generic (skb) XDP path, replace it with an &xdp_attachment_info struct and use xdp_attachment_setup_rcu() (since Generic XDP code RCU-protects the pointer already). This slightly changes the struct &net_device cacheline layout, but nothing performance-critical. Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 7 +++---- net/bpf/dev.c | 11 ++++------- net/core/dev.c | 4 +++- net/core/rtnetlink.c | 2 +- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 60df42b3f116..1c033c164257 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2168,7 +2168,7 @@ struct net_device { unsigned int num_rx_queues; unsigned int real_num_rx_queues; - struct bpf_prog __rcu *xdp_prog; + struct xdp_attachment_info xdp_info; unsigned long gro_flush_timeout; int napi_defer_hard_irqs; #define GRO_LEGACY_MAX_SIZE 65536u @@ -2343,9 +2343,8 @@ struct net_device { static inline bool netif_elide_gro(const struct net_device *dev) { - if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) - return true; - return false; + return !(dev->features & NETIF_F_GRO) || + rcu_access_pointer(dev->xdp_info.prog_rcu); } #define NETDEV_ALIGN 32 diff --git a/net/bpf/dev.c b/net/bpf/dev.c index 82948d0536c8..cc43f73929f3 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -242,19 +242,16 @@ static void dev_disable_gro_hw(struct net_device *dev) static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) { - struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); - struct bpf_prog *new = xdp->prog; + bool old = !!rtnl_dereference(dev->xdp_info.prog_rcu); int ret = 0; switch (xdp->command) { case XDP_SETUP_PROG: - rcu_assign_pointer(dev->xdp_prog, new); - if (old) - bpf_prog_put(old); + xdp_attachment_setup_rcu(&dev->xdp_info, xdp); - if (old && !new) { + if (old && !xdp->prog) { static_branch_dec(&generic_xdp_needed_key); - } else if (new && !old) { + } else if (xdp->prog && !old) { static_branch_inc(&generic_xdp_needed_key); dev_disable_lro(dev); dev_disable_gro_hw(dev); diff --git a/net/core/dev.c b/net/core/dev.c index 62bf6ee00741..e57ae87d619e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5055,10 +5055,12 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, __this_cpu_inc(softnet_data.processed); if (static_branch_unlikely(&generic_xdp_needed_key)) { + struct bpf_prog *prog; int ret2; migrate_disable(); - ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb); + prog = rcu_dereference(skb->dev->xdp_info.prog_rcu); + ret2 = do_xdp_generic(prog, skb); migrate_enable(); if (ret2 != XDP_PASS) { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 500420d5017c..72f696b12df2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1451,7 +1451,7 @@ static u32 rtnl_xdp_prog_skb(struct net_device *dev) ASSERT_RTNL(); - generic_xdp_prog = rtnl_dereference(dev->xdp_prog); + generic_xdp_prog = rtnl_dereference(dev->xdp_info.prog_rcu); if (!generic_xdp_prog) return 0; return generic_xdp_prog->aux->id; From patchwork Tue Jun 28 19:48:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898865 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 333B1C43334 for ; Tue, 28 Jun 2022 19:53:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232840AbiF1Tx1 (ORCPT ); Tue, 28 Jun 2022 15:53:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232495AbiF1Tu7 (ORCPT ); Tue, 28 Jun 2022 15:50:59 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C319B2DD7B; Tue, 28 Jun 2022 12:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445796; x=1687981796; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g8Auyiv6wHbONHbwKT22jacyNDfM23OeRa/UWJqi3CM=; b=YdjyG9s+lBgEx52Mi0a3xSijbBo66ikErxkPW+n3P7LM4lonlSWU35e6 p+thIwS3/BYJRLD4P+oOt7lninnwgGVcAlbK0aBA2u1Lrvc3t+bIXlQUg 7LDJBg5dmeqtDzQlGO5Ib9ai/UCM+JYsu0xoRoblijEpc/jd3ttgOeJUf 1y/MDyyrviQpTDIvCNd+wHjhLpHJpdwy9TOFuelfrKxuSp6YflzE7xlJa wqioBeOdP8JBda73K4GqUeedPyP8IR3vuMIsIt8I77tsZOIG16fFWZyhH Rrh1pzawgeya0e+mZCFGxTqvEvXNfNzrpUJnO+RQylwFFPm7VxILQcU4Y A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="345828494" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="345828494" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="680182656" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by FMSMGA003.fm.intel.com with ESMTP; 28 Jun 2022 12:49:51 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9g022013; Tue, 28 Jun 2022 20:49:49 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 42/52] net, xdp: shortcut skb->dev in bpf_prog_run_generic_xdp() Date: Tue, 28 Jun 2022 21:48:02 +0200 Message-Id: <20220628194812.1453059-43-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC It's being used 3 times and more to come. Fetch it onto the stack to reduce jumping back and forth. Signed-off-by: Alexander Lobakin --- net/bpf/dev.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/bpf/dev.c b/net/bpf/dev.c index cc43f73929f3..350ebdc783a0 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -31,6 +31,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { void *orig_data, *orig_data_end, *hard_start; + struct net_device *dev = skb->dev; struct netdev_rx_queue *rxqueue; bool orig_bcast, orig_host; u32 mac_len, frame_sz; @@ -57,7 +58,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, orig_data_end = xdp->data_end; orig_data = xdp->data; eth = (struct ethhdr *)xdp->data; - orig_host = ether_addr_equal_64bits(eth->h_dest, skb->dev->dev_addr); + orig_host = ether_addr_equal_64bits(eth->h_dest, dev->dev_addr); orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest); orig_eth_type = eth->h_proto; @@ -86,11 +87,11 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, eth = (struct ethhdr *)xdp->data; if ((orig_eth_type != eth->h_proto) || (orig_host != ether_addr_equal_64bits(eth->h_dest, - skb->dev->dev_addr)) || + dev->dev_addr)) || (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) { __skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; - skb->protocol = eth_type_trans(skb, skb->dev); + skb->protocol = eth_type_trans(skb, dev); } /* Redirect/Tx gives L2 packet, code that will reuse skb must __skb_pull From patchwork Tue Jun 28 19:48:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898864 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79BF0CCA47F for ; Tue, 28 Jun 2022 19:53:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232903AbiF1Tx2 (ORCPT ); Tue, 28 Jun 2022 15:53:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232494AbiF1Tu7 (ORCPT ); Tue, 28 Jun 2022 15:50:59 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93B5D2E09A; Tue, 28 Jun 2022 12:49:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445797; x=1687981797; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9Ah1nON265YzupnBokZWEAPrUTKFeqbKYbnG7bxwujc=; b=jwhXTTPL30RP0omRzUFU22r8Yf9xPM/I6tu8rjW1UH7q6Sv1HjWTCtt5 Vo+f1XqD+IrkNxKDGNieFdMG3arpimjIrx6OglobAdvUEeoZQxC9jTtMx b3+Q53oroBwN7cQ4tuWYnUIy/cNZeLcqW2+lOmRdWrT9jpqfgyvAj7KPc ed8lfyg1KnguQ+EdUUa7Qzdxl/42MByB41u+ntiB5tzJY1YRhBMq4KxEp 1dmJeI+XDH8Ol1oBls2exCoIuskSfgrEW0X/Y77n7R2wjvjKFxZATdU8r O+t3PQpqMZZvJQAYbAYwUGPtLe5cEA0KpYOWBggEsJ+dXLAuWdf1wpHo9 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280596029" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280596029" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="594927639" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2022 12:49:53 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9h022013; Tue, 28 Jun 2022 20:49:51 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 43/52] net, xdp: build XDP generic metadata on Generic (skb) XDP path Date: Tue, 28 Jun 2022 21:48:03 +0200 Message-Id: <20220628194812.1453059-44-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Now that the core has the routine to make XDP generic metadata from the skb fields and &net_device stores meta_thresh, provide XDP generic metadata to BPF programs running on Generic/skb XDP path. skb fields are being updated from the metadata after BPF program exits (if it's still there). Signed-off-by: Alexander Lobakin --- net/bpf/dev.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/net/bpf/dev.c b/net/bpf/dev.c index 350ebdc783a0..f4187b357a0c 100644 --- a/net/bpf/dev.c +++ b/net/bpf/dev.c @@ -1,7 +1,20 @@ // SPDX-License-Identifier: GPL-2.0-only +#include #include +enum { + GENERIC_XDP_META_GEN, + + /* Must be last */ + GENERIC_XDP_META_NONE, + __GENERIC_XDP_META_NUM, +}; + +static const char * const generic_xdp_meta_types[__GENERIC_XDP_META_NUM] = { + [GENERIC_XDP_META_GEN] = "struct xdp_meta_generic", +}; + DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb) @@ -27,17 +40,33 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb) return rxqueue; } +static void generic_xdp_handle_meta(struct xdp_buff *xdp, struct sk_buff *skb, + const struct xdp_attachment_info *info) +{ + if (xdp->data_end - xdp->data < READ_ONCE(info->meta_thresh)) + return; + + switch (READ_ONCE(info->drv_cookie)) { + case GENERIC_XDP_META_GEN: + xdp_build_meta_generic_from_skb(skb); + xdp->data_meta = skb_metadata_end(skb) - skb_metadata_len(skb); + break; + default: + break; + } +} + u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { void *orig_data, *orig_data_end, *hard_start; struct net_device *dev = skb->dev; struct netdev_rx_queue *rxqueue; + u32 metalen, orig_metalen, act; bool orig_bcast, orig_host; u32 mac_len, frame_sz; __be16 orig_eth_type; struct ethhdr *eth; - u32 metalen, act; int off; /* The XDP program wants to see the packet starting at the MAC @@ -62,6 +91,9 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest); orig_eth_type = eth->h_proto; + generic_xdp_handle_meta(xdp, skb, &dev->xdp_info); + orig_metalen = xdp->data - xdp->data_meta; + act = bpf_prog_run_xdp(xdp_prog, xdp); /* check if bpf_xdp_adjust_head was used */ @@ -105,11 +137,15 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, case XDP_REDIRECT: case XDP_TX: __skb_push(skb, mac_len); - break; + fallthrough; case XDP_PASS: metalen = xdp->data - xdp->data_meta; - if (metalen) + if (metalen != orig_metalen) skb_metadata_set(skb, metalen); + if (metalen) + xdp_populate_skb_meta_generic(skb); + else if (orig_metalen) + skb_metadata_nocomp_clear(skb); break; } @@ -244,10 +280,15 @@ static void dev_disable_gro_hw(struct net_device *dev) static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) { bool old = !!rtnl_dereference(dev->xdp_info.prog_rcu); - int ret = 0; + int ret; switch (xdp->command) { case XDP_SETUP_PROG: + ret = xdp_meta_match_id(generic_xdp_meta_types, xdp->btf_id); + if (ret < 0) + return ret; + + WRITE_ONCE(dev->xdp_info.drv_cookie, ret); xdp_attachment_setup_rcu(&dev->xdp_info, xdp); if (old && !xdp->prog) { @@ -257,6 +298,8 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) dev_disable_lro(dev); dev_disable_gro_hw(dev); } + + ret = 0; break; default: From patchwork Tue Jun 28 19:48:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898874 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52754C43334 for ; Tue, 28 Jun 2022 19:54:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233393AbiF1Tx7 (ORCPT ); Tue, 28 Jun 2022 15:53:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232040AbiF1TvA (ORCPT ); Tue, 28 Jun 2022 15:51:00 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C433A2DD49; Tue, 28 Jun 2022 12:49:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445798; x=1687981798; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+NKlBQ5LgZgO3ibGntvsU9/Z+gabvKL1N1nJ8jtM7cg=; b=Wrfd+cIkjvqGmwDpV77Zs3v6GsRl1WJpSuDXnD/FpyTE847hNZNmDHow a9ZzHHNB3myyI9QVVAQJMW59P+06iN+rQuklVS6J1sDCgDAR23ZgJDeUH 9Dn0AIAOGCWr/QmkTk4RElR4eA5CVjlz6tMbnBqy/fkeoeD5DBVvnmK4M IHtiVMzot1jJpvVumpEKAbtF+6rSF0lZyXa3QcC/KrSpETMl4HDYiS/t4 3POzcwsg/aFTQuRYvoE+kb7IP1v4Oa4tl7/zy+iEYS8cQ/wx0+OtXYRcx gxlF/5blLEtqHHCbeRk0owWy+O/gVBEkLs1YJvKxITcM1WjG95paCr98V w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523384" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523384" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="658257588" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga004.fm.intel.com with ESMTP; 28 Jun 2022 12:49:54 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9i022013; Tue, 28 Jun 2022 20:49:52 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 44/52] net, ice: allow XDP prog hot-swapping Date: Tue, 28 Jun 2022 21:48:04 +0200 Message-Id: <20220628194812.1453059-45-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Currently, an interface is always being brought down on %XDP_SETUP_PROG, no matter if there would be a global configuration change (no prog -> prog, prog -> no prog) or just a hot-swap (prog -> prog). That is suboptimal, especially when old_prog == new_prog, which should be a no-op at all. Moreover, it makes it impossible to change some aux XDP options on the fly which could be designed to work like that. Store &xdp_attachment_info in just one copy inside the VSI structure, RQs will only have pointers to it. This way we only need to rewrite it once and xdp_attachment_setup_rcu() now may be used. Guard NAPI poll routines with RCU read locks to make sure the BPF prog won't get freed right in the middle of a cycle. Now the old program will be freed only when all of the rings will use the new one already. Then do an ifdown->ifup cycle in ::ndo_bpf() only if absolutely needed (mentioned above), the rest will be completely safe to do on the go. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/ice/ice.h | 8 +-- drivers/net/ethernet/intel/ice/ice_lib.c | 4 +- drivers/net/ethernet/intel/ice/ice_main.c | 61 ++++++++++------------- drivers/net/ethernet/intel/ice/ice_txrx.c | 11 ++-- drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- 6 files changed, 40 insertions(+), 48 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 60453b3b8d23..402b71ab48e4 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -386,7 +386,7 @@ struct ice_vsi { u16 num_tx_desc; u16 qset_handle[ICE_MAX_TRAFFIC_CLASS]; struct ice_tc_cfg tc_cfg; - struct bpf_prog *xdp_prog; + struct xdp_attachment_info xdp_info; struct ice_tx_ring **xdp_rings; /* XDP ring array */ unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */ u16 num_xdp_txq; /* Used XDP queues */ @@ -672,7 +672,7 @@ static inline struct ice_pf *ice_netdev_to_pf(struct net_device *netdev) static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi) { - return !!READ_ONCE(vsi->xdp_prog); + return !!rcu_access_pointer(vsi->xdp_info.prog_rcu); } static inline void ice_set_ring_xdp(struct ice_tx_ring *ring) @@ -857,8 +857,8 @@ int ice_down(struct ice_vsi *vsi); int ice_vsi_cfg(struct ice_vsi *vsi); struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi); int ice_vsi_determine_xdp_res(struct ice_vsi *vsi); -int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog); -int ice_destroy_xdp_rings(struct ice_vsi *vsi); +int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct netdev_bpf *xdp); +int ice_destroy_xdp_rings(struct ice_vsi *vsi, struct netdev_bpf *xdp); int ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags); diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index b28fb8eacffb..3db1271b5176 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3200,7 +3200,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) /* return value check can be skipped here, it always returns * 0 if reset is in progress */ - ice_destroy_xdp_rings(vsi); + ice_destroy_xdp_rings(vsi, NULL); ice_vsi_put_qs(vsi); ice_vsi_clear_rings(vsi); ice_vsi_free_arrays(vsi); @@ -3248,7 +3248,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) ret = ice_vsi_determine_xdp_res(vsi); if (ret) goto err_vectors; - ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog); + ret = ice_prepare_xdp_rings(vsi, NULL); if (ret) goto err_vectors; } diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index c1ac2f746714..7d049930a0a8 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2603,32 +2603,14 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) return -ENOMEM; } -/** - * ice_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI - * @vsi: VSI to set the bpf prog on - * @prog: the bpf prog pointer - */ -static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog) -{ - struct bpf_prog *old_prog; - int i; - - old_prog = xchg(&vsi->xdp_prog, prog); - if (old_prog) - bpf_prog_put(old_prog); - - ice_for_each_rxq(vsi, i) - WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog); -} - /** * ice_prepare_xdp_rings - Allocate, configure and setup Tx rings for XDP * @vsi: VSI to bring up Tx rings used by XDP - * @prog: bpf program that will be assigned to VSI + * @xdp: &netdev_bpf with XDP program and additional data passed from the stack * * Return 0 on success and negative value on error */ -int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) +int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct netdev_bpf *xdp) { u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 }; int xdp_rings_rem = vsi->num_xdp_txq; @@ -2713,8 +2695,8 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) * this is not harmful as dev_xdp_install bumps the refcount * before calling the op exposed by the driver; */ - if (!ice_is_xdp_ena_vsi(vsi)) - ice_vsi_assign_bpf_prog(vsi, prog); + if (xdp) + xdp_attachment_setup_rcu(&vsi->xdp_info, xdp); return 0; clear_xdp_rings: @@ -2739,11 +2721,12 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) /** * ice_destroy_xdp_rings - undo the configuration made by ice_prepare_xdp_rings * @vsi: VSI to remove XDP rings + * @xdp: &netdev_bpf with XDP program and additional data passed from the stack * * Detach XDP rings from irq vectors, clean up the PF bitmap and free * resources */ -int ice_destroy_xdp_rings(struct ice_vsi *vsi) +int ice_destroy_xdp_rings(struct ice_vsi *vsi, struct netdev_bpf *xdp) { u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 }; struct ice_pf *pf = vsi->back; @@ -2796,7 +2779,11 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi) if (ice_is_reset_in_progress(pf->state) || !vsi->q_vectors[0]) return 0; - ice_vsi_assign_bpf_prog(vsi, NULL); + /* Symmetrically to ice_prepare_xdp_rings(), touch XDP program only + * when called from ::ndo_bpf(). + */ + if (xdp) + xdp_attachment_setup_rcu(&vsi->xdp_info, xdp); /* notify Tx scheduler that we destroyed XDP queues and bring * back the old number of child nodes @@ -2853,15 +2840,14 @@ int ice_vsi_determine_xdp_res(struct ice_vsi *vsi) /** * ice_xdp_setup_prog - Add or remove XDP eBPF program * @vsi: VSI to setup XDP for - * @prog: XDP program - * @extack: netlink extended ack + * @xdp: &netdev_bpf with XDP program and additional data passed from the stack */ static int -ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, - struct netlink_ext_ack *extack) +ice_xdp_setup_prog(struct ice_vsi *vsi, struct netdev_bpf *xdp) { int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD; - bool if_running = netif_running(vsi->netdev); + struct netlink_ext_ack *extack = xdp->extack; + bool restart = false, prog = !!xdp->prog; int ret = 0, xdp_ring_err = 0; if (frame_size > vsi->rx_buf_len) { @@ -2870,12 +2856,15 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, } /* need to stop netdev while setting up the program for Rx rings */ - if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) { + if (ice_is_xdp_ena_vsi(vsi) != prog && netif_running(vsi->netdev) && + !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) { ret = ice_down(vsi); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Preparing device for XDP attach failed"); return ret; } + + restart = true; } if (!ice_is_xdp_ena_vsi(vsi) && prog) { @@ -2883,24 +2872,24 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, if (xdp_ring_err) { NL_SET_ERR_MSG_MOD(extack, "Not enough Tx resources for XDP"); } else { - xdp_ring_err = ice_prepare_xdp_rings(vsi, prog); + xdp_ring_err = ice_prepare_xdp_rings(vsi, xdp); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); } } else if (ice_is_xdp_ena_vsi(vsi) && !prog) { - xdp_ring_err = ice_destroy_xdp_rings(vsi); + xdp_ring_err = ice_destroy_xdp_rings(vsi, xdp); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Freeing XDP Tx resources failed"); } else { - /* safe to call even when prog == vsi->xdp_prog as + /* safe to call even when prog == vsi->xdp_info.prog as * dev_xdp_install in net/core/dev.c incremented prog's * refcount so corresponding bpf_prog_put won't cause * underflow */ - ice_vsi_assign_bpf_prog(vsi, prog); + xdp_attachment_setup_rcu(&vsi->xdp_info, xdp); } - if (if_running) + if (restart) ret = ice_up(vsi); if (!ret && prog) @@ -2940,7 +2929,7 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp) switch (xdp->command) { case XDP_SETUP_PROG: - return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack); + return ice_xdp_setup_prog(vsi, xdp); case XDP_SETUP_XSK_POOL: return ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 3f8b7274ed2f..25383bbf8245 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -454,7 +454,7 @@ void ice_free_rx_ring(struct ice_rx_ring *rx_ring) if (rx_ring->vsi->type == ICE_VSI_PF) if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq)) xdp_rxq_info_unreg(&rx_ring->xdp_rxq); - rx_ring->xdp_prog = NULL; + if (rx_ring->xsk_pool) { kfree(rx_ring->xdp_buf); rx_ring->xdp_buf = NULL; @@ -507,8 +507,7 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring) rx_ring->next_to_use = 0; rx_ring->next_to_clean = 0; - if (ice_is_xdp_ena_vsi(rx_ring->vsi)) - WRITE_ONCE(rx_ring->xdp_prog, rx_ring->vsi->xdp_prog); + rx_ring->xdp_info = &rx_ring->vsi->xdp_info; if (rx_ring->vsi->type == ICE_VSI_PF && !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq)) @@ -1123,7 +1122,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) #endif xdp_init_buff(&xdp, frame_sz, &rx_ring->xdp_rxq); - xdp_prog = READ_ONCE(rx_ring->xdp_prog); + xdp_prog = rcu_dereference(rx_ring->xdp_info->prog_rcu); if (xdp_prog) xdp_ring = rx_ring->xdp_ring; @@ -1489,6 +1488,8 @@ int ice_napi_poll(struct napi_struct *napi, int budget) /* Max of 1 Rx ring in this q_vector so give it the budget */ budget_per_ring = budget; + rcu_read_lock(); + ice_for_each_rx_ring(rx_ring, q_vector->rx) { int cleaned; @@ -1505,6 +1506,8 @@ int ice_napi_poll(struct napi_struct *napi, int budget) clean_complete = false; } + rcu_read_unlock(); + /* If work not completed, return budget and polling will return */ if (!clean_complete) { /* Set the writeback on ITR so partial completions of diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index ca902af54bb4..1fc31ab0bf33 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -290,7 +290,7 @@ struct ice_rx_ring { struct rcu_head rcu; /* to avoid race on free */ /* CL4 - 3rd cacheline starts here */ struct ice_channel *ch; - struct bpf_prog *xdp_prog; + const struct xdp_attachment_info *xdp_info; struct ice_tx_ring *xdp_ring; struct xsk_buff_pool *xsk_pool; struct sk_buff *skb; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 49ba8bfdbf04..eb994cf68ff4 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -597,7 +597,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) /* ZC patch is enabled only when XDP program is set, * so here it can not be NULL */ - xdp_prog = READ_ONCE(rx_ring->xdp_prog); + xdp_prog = rcu_dereference(rx_ring->xdp_info->prog_rcu); xdp_ring = rx_ring->xdp_ring; while (likely(total_rx_packets < (unsigned int)budget)) { From patchwork Tue Jun 28 19:48:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898870 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E842C43334 for ; Tue, 28 Jun 2022 19:53:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233129AbiF1Txh (ORCPT ); Tue, 28 Jun 2022 15:53:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232578AbiF1TvF (ORCPT ); Tue, 28 Jun 2022 15:51:05 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC4C72E680; Tue, 28 Jun 2022 12:50:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445800; x=1687981800; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0nbuFAJnH5Qg7FZsjjoFUa+z8pMZ1fWuCJ5sB2mHG5A=; b=DI94SPraJQB1FhLKbqBurr2T74D6FJ+tEay4BJR9P/uLexM4dkazqrQB AJm030C/gk5QYl5VV/8DZoRBsMCIS/tYfRARAL5orulXmlNnh6x6BSpcK m2DIIsUpSFX+oHqArcNURIahvx34lRdjR6EfOZR53C7FeSOd5/Bx7srDc gK7/7ZHTYrZq9M9DmmS6u7O2iq9JWwqHJfoeN8ya8p7G8eveubtNmDs9T RxT4f2aSTV4ZxaJIFi7P+1hTH8cg/sRiVU4ujLdrNQS6s6V8MroNSwRHe LrGtZZePXwaPnAFyKFf/Qm0beqjn/hO2E0GROX+xqCk7WywBMChUqEmLu g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282568323" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282568323" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="836809549" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga006.fm.intel.com with ESMTP; 28 Jun 2022 12:49:55 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9j022013; Tue, 28 Jun 2022 20:49:53 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 45/52] net, ice: consolidate all skb fields processing Date: Tue, 28 Jun 2022 21:48:05 +0200 Message-Id: <20220628194812.1453059-46-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC For now, skb fields filling is scattered across RQ / XSK RQ polling function. Make it consistent and do everything in ice_process_skb_fields(). Obtaining @vlan_tag and @rx_ptype can be moved in there too, there is no reason to do it outside. ice_receive_skb() now becomes just a standard pair of eth_type_trans() + napi_gro_receive(), make it static inline to save a couple redundant jumps. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/ice/ice_txrx.c | 19 +---- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 81 +++++++++---------- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 25 +++++- drivers/net/ethernet/intel/ice/ice_xsk.c | 11 +-- 4 files changed, 65 insertions(+), 71 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 25383bbf8245..ffea5138a7e8 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -949,11 +949,6 @@ ice_build_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, if (unlikely(!skb)) return NULL; - /* must to record Rx queue, otherwise OS features such as - * symmetric queue won't work - */ - skb_record_rx_queue(skb, rx_ring->q_index); - /* update pointers within the skb to store the data */ skb_reserve(skb, xdp->data - xdp->data_hard_start); __skb_put(skb, xdp->data_end - xdp->data); @@ -995,7 +990,6 @@ ice_construct_skb(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf, if (unlikely(!skb)) return NULL; - skb_record_rx_queue(skb, rx_ring->q_index); /* Determine available headroom for copy */ headlen = size; if (headlen > ICE_RX_HDR_SIZE) @@ -1134,8 +1128,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) unsigned int size; u16 stat_err_bits; int rx_buf_pgcnt; - u16 vlan_tag = 0; - u16 rx_ptype; /* get the Rx desc from Rx ring based on 'next_to_clean' */ rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); @@ -1238,8 +1230,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) continue; } - vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); - /* pad the skb if needed, to make a valid ethernet frame */ if (eth_skb_pad(skb)) { skb = NULL; @@ -1249,15 +1239,10 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* probably a little skewed due to removing CRC */ total_rx_bytes += skb->len; - /* populate checksum, VLAN, and protocol */ - rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & - ICE_RX_FLEX_DESC_PTYPE_M; - - ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype); + ice_process_skb_fields(rx_ring, rx_desc, skb); ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb); - /* send completed skb up the stack */ - ice_receive_skb(rx_ring, skb, vlan_tag); + ice_receive_skb(rx_ring, skb); skb = NULL; /* update budget accounting */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 7ee38d02d1e5..92c001baa2cc 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -40,16 +40,15 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val) /** * ice_ptype_to_htype - get a hash type - * @ptype: the ptype value from the descriptor + * @decoded: the decoded ptype value from the descriptor * * Returns appropriate hash type (such as PKT_HASH_TYPE_L2/L3/L4) to be used by * skb_set_hash based on PTYPE as parsed by HW Rx pipeline and is part of * Rx desc. */ -static enum pkt_hash_types ice_ptype_to_htype(u16 ptype) +static enum pkt_hash_types +ice_ptype_to_htype(struct ice_rx_ptype_decoded decoded) { - struct ice_rx_ptype_decoded decoded = ice_decode_rx_desc_ptype(ptype); - if (!decoded.known) return PKT_HASH_TYPE_NONE; if (decoded.payload_layer == ICE_RX_PTYPE_PAYLOAD_LAYER_PAY4) @@ -67,11 +66,11 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype) * @rx_ring: descriptor ring * @rx_desc: specific descriptor * @skb: pointer to current skb - * @rx_ptype: the ptype value from the descriptor + * @decoded: the decoded ptype value from the descriptor */ static void ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 rx_ptype) + struct sk_buff *skb, struct ice_rx_ptype_decoded decoded) { struct ice_32b_rx_flex_desc_nic *nic_mdid; u32 hash; @@ -84,7 +83,7 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc; hash = le32_to_cpu(nic_mdid->rss_hash); - skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype)); + skb_set_hash(skb, hash, ice_ptype_to_htype(decoded)); } /** @@ -92,23 +91,21 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, * @ring: the ring we care about * @skb: skb currently being received and modified * @rx_desc: the receive descriptor - * @ptype: the packet type decoded by hardware + * @decoded: the decoded packet type parsed by hardware * * skb->protocol must be set before this function is called */ static void ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, - union ice_32b_rx_flex_desc *rx_desc, u16 ptype) + union ice_32b_rx_flex_desc *rx_desc, + struct ice_rx_ptype_decoded decoded) { - struct ice_rx_ptype_decoded decoded; u16 rx_status0, rx_status1; bool ipv4, ipv6; rx_status0 = le16_to_cpu(rx_desc->wb.status_error0); rx_status1 = le16_to_cpu(rx_desc->wb.status_error1); - decoded = ice_decode_rx_desc_ptype(ptype); - /* Start with CHECKSUM_NONE and by default csum_level = 0 */ skb->ip_summed = CHECKSUM_NONE; skb_checksum_none_assert(skb); @@ -170,12 +167,31 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, ring->vsi->back->hw_csum_rx_error++; } +static void ice_rx_vlan(struct sk_buff *skb, + const struct ice_rx_ring *rx_ring, + const union ice_32b_rx_flex_desc *rx_desc) +{ + netdev_features_t features = rx_ring->netdev->features; + bool non_zero_vlan; + u16 vlan_tag; + + vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); + non_zero_vlan = !!(vlan_tag & VLAN_VID_MASK); + + if (!non_zero_vlan) + return; + + if ((features & NETIF_F_HW_VLAN_CTAG_RX)) + __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag); + else if ((features & NETIF_F_HW_VLAN_STAG_RX)) + __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), vlan_tag); +} + /** * ice_process_skb_fields - Populate skb header fields from Rx descriptor * @rx_ring: Rx descriptor ring packet is being transacted on * @rx_desc: pointer to the EOP Rx descriptor * @skb: pointer to current skb being populated - * @ptype: the packet type decoded by hardware * * This function checks the ring, descriptor, and packet information in * order to populate the hash, checksum, VLAN, protocol, and @@ -184,42 +200,25 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, void ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 ptype) + struct sk_buff *skb) { - ice_rx_hash(rx_ring, rx_desc, skb, ptype); + struct ice_rx_ptype_decoded decoded; + u16 ptype; - /* modifies the skb - consumes the enet header */ - skb->protocol = eth_type_trans(skb, rx_ring->netdev); + skb_record_rx_queue(skb, rx_ring->q_index); - ice_rx_csum(rx_ring, skb, rx_desc, ptype); + ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & + ICE_RX_FLEX_DESC_PTYPE_M; + decoded = ice_decode_rx_desc_ptype(ptype); + + ice_rx_hash(rx_ring, rx_desc, skb, decoded); + ice_rx_csum(rx_ring, skb, rx_desc, decoded); + ice_rx_vlan(skb, rx_ring, rx_desc); if (rx_ring->ptp_rx) ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb); } -/** - * ice_receive_skb - Send a completed packet up the stack - * @rx_ring: Rx ring in play - * @skb: packet to send up - * @vlan_tag: VLAN tag for packet - * - * This function sends the completed packet (via. skb) up the stack using - * gro receive functions (with/without VLAN tag) - */ -void -ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag) -{ - netdev_features_t features = rx_ring->netdev->features; - bool non_zero_vlan = !!(vlan_tag & VLAN_VID_MASK); - - if ((features & NETIF_F_HW_VLAN_CTAG_RX) && non_zero_vlan) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag); - else if ((features & NETIF_F_HW_VLAN_STAG_RX) && non_zero_vlan) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), vlan_tag); - - napi_gro_receive(&rx_ring->q_vector->napi, skb); -} - /** * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring * @xdp_ring: XDP ring to clean diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index c7d2954dc9ea..45dc5ef79e28 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -40,7 +40,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag) * one is found return the tag, else return 0 to mean no VLAN tag was found. */ static inline u16 -ice_get_vlan_tag_from_rx_desc(union ice_32b_rx_flex_desc *rx_desc) +ice_get_vlan_tag_from_rx_desc(const union ice_32b_rx_flex_desc *rx_desc) { u16 stat_err_bits; @@ -55,6 +55,24 @@ ice_get_vlan_tag_from_rx_desc(union ice_32b_rx_flex_desc *rx_desc) return 0; } +/** + * ice_receive_skb - Send a completed packet up the stack + * @rx_ring: Rx ring in play + * @skb: packet to send up + * + * This function sends the completed packet (via. skb) up the stack using + * gro receive functions + */ +static inline void ice_receive_skb(const struct ice_rx_ring *rx_ring, + struct sk_buff *skb) +{ + /* modifies the skb - consumes the enet header */ + skb->protocol = eth_type_trans(skb, rx_ring->netdev); + + /* send completed skb up the stack */ + napi_gro_receive(&rx_ring->q_vector->napi, skb); +} + /** * ice_xdp_ring_update_tail - Updates the XDP Tx ring tail register * @xdp_ring: XDP Tx ring @@ -77,7 +95,6 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val); void ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 ptype); -void -ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag); + struct sk_buff *skb); + #endif /* !_ICE_TXRX_LIB_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index eb994cf68ff4..0a66128964e7 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -606,8 +606,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) struct xdp_buff *xdp; struct sk_buff *skb; u16 stat_err_bits; - u16 vlan_tag = 0; - u16 rx_ptype; rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); @@ -675,13 +673,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) total_rx_bytes += skb->len; total_rx_packets++; - vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); - - rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & - ICE_RX_FLEX_DESC_PTYPE_M; - - ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype); - ice_receive_skb(rx_ring, skb, vlan_tag); + ice_process_skb_fields(rx_ring, rx_desc, skb); + ice_receive_skb(rx_ring, skb); } entries_to_alloc = ICE_DESC_UNUSED(rx_ring); From patchwork Tue Jun 28 19:48:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898869 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 178B8C43334 for ; Tue, 28 Jun 2022 19:53:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233098AbiF1Txe (ORCPT ); Tue, 28 Jun 2022 15:53:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232530AbiF1TvE (ORCPT ); Tue, 28 Jun 2022 15:51:04 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B3F32C66A; Tue, 28 Jun 2022 12:50:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445801; x=1687981801; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RMjkP7N3C2aO9AK8761DdIt9FB/kHYV8pUPcGpDSf7k=; b=asWFRFcFARiYjK9mBun5GPp/WU1aL9x0NAjxChHdkWpYS8tnMzH2I1Ja MCts2qnENZPjl/RRd2iFFxTW2ZNi5Xkpm/heSqn0pRHmOVZdtcIR4d3N/ C8/yP+vH/y4Nmi9VagoZIG8Hs6lsvYSThpOkwFLmQ7cnpGow9V6tzPOAz Qz2fRSPhd15GVNveVkgp12Mv8qs5ZdK5y4NqSTMpYax6hrCZDNBxLANIg MIr1dzOovf7XQhEH77gRG8iP9wpEM4B8HJgtaooWJc2RTu96xx38MERCi Afv1cswgGJMYJn7Qh6lex3Jbw41MYSOaUcToor/6GHyBUstu0i6lKcp71 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280596055" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280596055" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="594927668" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2022 12:49:57 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9k022013; Tue, 28 Jun 2022 20:49:55 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 46/52] net, ice: use an onstack &xdp_meta_generic_rx to store HW frame info Date: Tue, 28 Jun 2022 21:48:06 +0200 Message-Id: <20220628194812.1453059-47-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC To be able to pass HW-provided frame metadata, such as hash, checksum status etc., to BPF and XSK programs, unify the container which is used to store it regardless of an XDP program presence or a verdict returned by it. Use an intermediate onstack &xdp_meta_generic_rx before filling skb fields and switch descriptor parsing functions to use it instead of an &sk_buff. This works the same way how &xdp_buff is being filled before forming an skb. If metadata generation is enabled, the actual space in front of a frame will be used in the upcoming changes. Using &xdp_meta_generic_rx instead of full-blown &xdp_meta_generic reduces text size by 32 bytes per function. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/ice/ice_ptp.c | 19 ++-- drivers/net/ethernet/intel/ice/ice_ptp.h | 17 ++- drivers/net/ethernet/intel/ice/ice_txrx.c | 4 +- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 105 ++++++++++-------- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 12 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 4 +- 7 files changed, 91 insertions(+), 71 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index ef9344ef0d8e..d4d955152682 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -1795,24 +1795,22 @@ int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr) /** * ice_ptp_rx_hwtstamp - Check for an Rx timestamp - * @rx_ring: Ring to get the VSI info * @rx_desc: Receive descriptor - * @skb: Particular skb to send timestamp with + * @rx_ring: Ring to get the VSI info + * @md: Metadata to set timestamp in * * The driver receives a notification in the receive descriptor with timestamp. * The timestamp is in ns, so we must convert the result first. */ -void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) +void ice_ptp_rx_hwtstamp(struct xdp_meta_generic *md, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring) { u32 ts_high; u64 ts_ns; - /* Populate timesync data into skb */ + /* Populate timesync data into md */ if (rx_desc->wb.time_stamp_low & ICE_PTP_TS_VALID) { - struct skb_shared_hwtstamps *hwtstamps; - /* Use ice_ptp_extend_32b_ts directly, using the ring-specific * cached PHC value, rather than accessing the PF. This also * allows us to simply pass the upper 32bits of nanoseconds @@ -1822,9 +1820,8 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high); ts_ns = ice_ptp_extend_32b_ts(rx_ring->cached_phctime, ts_high); - hwtstamps = skb_hwtstamps(skb); - memset(hwtstamps, 0, sizeof(*hwtstamps)); - hwtstamps->hwtstamp = ns_to_ktime(ts_ns); + xdp_meta_rx_tstamp_present_set(md, 1); + xdp_meta_rx_tstamp_set(md, ts_ns); } } diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h index 10e396abf130..488b6bb01605 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp.h @@ -228,8 +228,12 @@ struct ice_ptp { #define N_EXT_TS_E810_NO_SMA 2 #define ETH_GLTSYN_ENA(_i) (0x03000348 + ((_i) * 4)) -#if IS_ENABLED(CONFIG_PTP_1588_CLOCK) struct ice_pf; +struct ice_rx_ring; +struct xdp_meta_generic; +union ice_32b_rx_flex_desc; + +#if IS_ENABLED(CONFIG_PTP_1588_CLOCK) int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr); int ice_ptp_get_ts_config(struct ice_pf *pf, struct ifreq *ifr); void ice_ptp_cfg_timestamp(struct ice_pf *pf, bool ena); @@ -238,9 +242,9 @@ int ice_get_ptp_clock_index(struct ice_pf *pf); s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb); void ice_ptp_process_ts(struct ice_pf *pf); -void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb); +void ice_ptp_rx_hwtstamp(struct xdp_meta_generic *md, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring); void ice_ptp_reset(struct ice_pf *pf); void ice_ptp_prepare_for_reset(struct ice_pf *pf); void ice_ptp_init(struct ice_pf *pf); @@ -271,8 +275,9 @@ ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb) static inline void ice_ptp_process_ts(struct ice_pf *pf) { } static inline void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) { } +ice_ptp_rx_hwtstamp(struct xdp_meta_generic *md, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring) { } static inline void ice_ptp_reset(struct ice_pf *pf) { } static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf) { } static inline void ice_ptp_init(struct ice_pf *pf) { } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index ffea5138a7e8..c679f7c30bdc 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1123,6 +1123,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* start the loop to process Rx packets bounded by 'budget' */ while (likely(total_rx_pkts < (unsigned int)budget)) { union ice_32b_rx_flex_desc *rx_desc; + struct xdp_meta_generic_rx md; struct ice_rx_buf *rx_buf; unsigned char *hard_start; unsigned int size; @@ -1239,7 +1240,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* probably a little skewed due to removing CRC */ total_rx_bytes += skb->len; - ice_process_skb_fields(rx_ring, rx_desc, skb); + ice_xdp_build_meta(&md, rx_desc, rx_ring, 0); + __xdp_populate_skb_meta_generic(skb, &md); ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb); ice_receive_skb(rx_ring, skb); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 1fc31ab0bf33..a814709deb50 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -4,6 +4,7 @@ #ifndef _ICE_TXRX_H_ #define _ICE_TXRX_H_ +#include #include "ice_type.h" #define ICE_DFLT_IRQ_WORK 256 diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 92c001baa2cc..7550e2ed8936 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -43,36 +43,37 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val) * @decoded: the decoded ptype value from the descriptor * * Returns appropriate hash type (such as PKT_HASH_TYPE_L2/L3/L4) to be used by - * skb_set_hash based on PTYPE as parsed by HW Rx pipeline and is part of - * Rx desc. + * xdp_meta_rx_hash_type_set() based on PTYPE as parsed by HW Rx pipeline and + * is part of Rx desc. */ -static enum pkt_hash_types +static u32 ice_ptype_to_htype(struct ice_rx_ptype_decoded decoded) { if (!decoded.known) - return PKT_HASH_TYPE_NONE; + return XDP_META_RX_HASH_NONE; if (decoded.payload_layer == ICE_RX_PTYPE_PAYLOAD_LAYER_PAY4) - return PKT_HASH_TYPE_L4; + return XDP_META_RX_HASH_L4; if (decoded.payload_layer == ICE_RX_PTYPE_PAYLOAD_LAYER_PAY3) - return PKT_HASH_TYPE_L3; + return XDP_META_RX_HASH_L3; if (decoded.outer_ip == ICE_RX_PTYPE_OUTER_L2) - return PKT_HASH_TYPE_L2; + return XDP_META_RX_HASH_L2; - return PKT_HASH_TYPE_NONE; + return XDP_META_RX_HASH_NONE; } /** - * ice_rx_hash - set the hash value in the skb + * ice_rx_hash - set the hash value in the medatadata + * @md: pointer to current metadata * @rx_ring: descriptor ring * @rx_desc: specific descriptor - * @skb: pointer to current skb * @decoded: the decoded ptype value from the descriptor */ -static void -ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, struct ice_rx_ptype_decoded decoded) +static void ice_rx_hash(struct xdp_meta_generic *md, + const struct ice_rx_ring *rx_ring, + const union ice_32b_rx_flex_desc *rx_desc, + struct ice_rx_ptype_decoded decoded) { - struct ice_32b_rx_flex_desc_nic *nic_mdid; + const struct ice_32b_rx_flex_desc_nic *nic_mdid; u32 hash; if (!(rx_ring->netdev->features & NETIF_F_RXHASH)) @@ -81,24 +82,24 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC) return; - nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc; + nic_mdid = (typeof(nic_mdid))rx_desc; hash = le32_to_cpu(nic_mdid->rss_hash); - skb_set_hash(skb, hash, ice_ptype_to_htype(decoded)); + + xdp_meta_rx_hash_type_set(md, ice_ptype_to_htype(decoded)); + xdp_meta_rx_hash_set(md, hash); } /** - * ice_rx_csum - Indicate in skb if checksum is good + * ice_rx_csum - Indicate in metadata if checksum is good + * @md: metadata currently being filled * @ring: the ring we care about - * @skb: skb currently being received and modified * @rx_desc: the receive descriptor * @decoded: the decoded packet type parsed by hardware - * - * skb->protocol must be set before this function is called */ -static void -ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, - union ice_32b_rx_flex_desc *rx_desc, - struct ice_rx_ptype_decoded decoded) +static void ice_rx_csum(struct xdp_meta_generic *md, + const struct ice_rx_ring *ring, + const union ice_32b_rx_flex_desc *rx_desc, + struct ice_rx_ptype_decoded decoded) { u16 rx_status0, rx_status1; bool ipv4, ipv6; @@ -106,10 +107,6 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, rx_status0 = le16_to_cpu(rx_desc->wb.status_error0); rx_status1 = le16_to_cpu(rx_desc->wb.status_error1); - /* Start with CHECKSUM_NONE and by default csum_level = 0 */ - skb->ip_summed = CHECKSUM_NONE; - skb_checksum_none_assert(skb); - /* check if Rx checksum is enabled */ if (!(ring->netdev->features & NETIF_F_RXCSUM)) return; @@ -149,14 +146,14 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, * we are indicating we validated the inner checksum. */ if (decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT) - skb->csum_level = 1; + xdp_meta_rx_csum_level_set(md, 1); /* Only report checksum unnecessary for TCP, UDP, or SCTP */ switch (decoded.inner_prot) { case ICE_RX_PTYPE_INNER_PROT_TCP: case ICE_RX_PTYPE_INNER_PROT_UDP: case ICE_RX_PTYPE_INNER_PROT_SCTP: - skb->ip_summed = CHECKSUM_UNNECESSARY; + xdp_meta_rx_csum_status_set(md, XDP_META_RX_CSUM_OK); break; default: break; @@ -167,7 +164,13 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, ring->vsi->back->hw_csum_rx_error++; } -static void ice_rx_vlan(struct sk_buff *skb, +#define xdp_meta_rx_vlan_from_feat(feat) ({ \ + ((feat) & NETIF_F_HW_VLAN_CTAG_RX) ? XDP_META_RX_CVID : \ + ((feat) & NETIF_F_HW_VLAN_STAG_RX) ? XDP_META_RX_SVID : \ + XDP_META_RX_VLAN_NONE; \ +}) + +static void ice_rx_vlan(struct xdp_meta_generic *md, const struct ice_rx_ring *rx_ring, const union ice_32b_rx_flex_desc *rx_desc) { @@ -181,42 +184,48 @@ static void ice_rx_vlan(struct sk_buff *skb, if (!non_zero_vlan) return; - if ((features & NETIF_F_HW_VLAN_CTAG_RX)) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag); - else if ((features & NETIF_F_HW_VLAN_STAG_RX)) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), vlan_tag); + xdp_meta_rx_vlan_type_set(md, xdp_meta_rx_vlan_from_feat(features)); + xdp_meta_rx_vid_set(md, vlan_tag); } /** - * ice_process_skb_fields - Populate skb header fields from Rx descriptor - * @rx_ring: Rx descriptor ring packet is being transacted on + * __ice_xdp_build_meta - Populate XDP generic metadata fields from Rx desc + * @rx_md: pointer to the metadata structure to be populated * @rx_desc: pointer to the EOP Rx descriptor - * @skb: pointer to current skb being populated + * @rx_ring: Rx descriptor ring packet is being transacted on + * @full_id: full ID (BTF ID + type ID) to fill in * * This function checks the ring, descriptor, and packet information in * order to populate the hash, checksum, VLAN, protocol, and - * other fields within the skb. + * other fields within the metadata. */ -void -ice_process_skb_fields(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb) +void __ice_xdp_build_meta(struct xdp_meta_generic_rx *rx_md, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring, + __le64 full_id) { + struct xdp_meta_generic *md = to_gen_md(rx_md); struct ice_rx_ptype_decoded decoded; u16 ptype; - skb_record_rx_queue(skb, rx_ring->q_index); + xdp_meta_init(&md->id, full_id); + md->rx_hash = 0; + md->rx_csum = 0; + md->rx_flags = 0; + + xdp_meta_rx_qid_present_set(md, 1); + xdp_meta_rx_qid_set(md, rx_ring->q_index); ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & ICE_RX_FLEX_DESC_PTYPE_M; decoded = ice_decode_rx_desc_ptype(ptype); - ice_rx_hash(rx_ring, rx_desc, skb, decoded); - ice_rx_csum(rx_ring, skb, rx_desc, decoded); - ice_rx_vlan(skb, rx_ring, rx_desc); + ice_rx_hash(md, rx_ring, rx_desc, decoded); + ice_rx_csum(md, rx_ring, rx_desc, decoded); + ice_rx_vlan(md, rx_ring, rx_desc); if (rx_ring->ptp_rx) - ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb); + ice_ptp_rx_hwtstamp(md, rx_desc, rx_ring); } /** diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 45dc5ef79e28..b51e58b8e83d 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -92,9 +92,13 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res); int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring); int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring); void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val); -void -ice_process_skb_fields(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb); + +void __ice_xdp_build_meta(struct xdp_meta_generic_rx *rx_md, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring, + __le64 full_id); + +#define ice_xdp_build_meta(md, ...) \ + __ice_xdp_build_meta(to_rx_md(md), ##__VA_ARGS__) #endif /* !_ICE_TXRX_LIB_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 0a66128964e7..eade918723eb 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -603,6 +603,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) while (likely(total_rx_packets < (unsigned int)budget)) { union ice_32b_rx_flex_desc *rx_desc; unsigned int size, xdp_res = 0; + struct xdp_meta_generic_rx md; struct xdp_buff *xdp; struct sk_buff *skb; u16 stat_err_bits; @@ -673,7 +674,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) total_rx_bytes += skb->len; total_rx_packets++; - ice_process_skb_fields(rx_ring, rx_desc, skb); + ice_xdp_build_meta(&md, rx_desc, rx_ring, 0); + __xdp_populate_skb_meta_generic(skb, &md); ice_receive_skb(rx_ring, skb); } From patchwork Tue Jun 28 19:48:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898868 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 879A0C43334 for ; Tue, 28 Jun 2022 19:53:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233023AbiF1Txd (ORCPT ); Tue, 28 Jun 2022 15:53:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232553AbiF1TvE (ORCPT ); Tue, 28 Jun 2022 15:51:04 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B59AC2D1FA; Tue, 28 Jun 2022 12:50:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445802; x=1687981802; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mGtrmaLjPcZAooH9D0ihpXQHll3Ueacp1yT+KsPx7iI=; b=DsUaoryMmLzcorx3s1JVgNTjDVRHbUz4Xsm34EXveL4mnCd9iQoEXM1N qTWFQURxo4Ez5t+z13daV5w2OG7yhbaKVHrPQLXQQ+34XFX+6N/NBpasg eVKZ1xnZwBKO8mQ0k3bBmZnH3ZdQz679/p5+6yQYy0FOiKrRW3m6RFrqZ 55jcnGNLQYT2wNe3nMbmMT/RYF7Nd5ecy5K4kQ+ZHw+nJoMUXk6BRebx9 NnCJ/SZ19mthYbEmOLkVniE0ngX4KwhNvy8UiprU4swLYilViq6hgBEHb M6wdPxrZcaFsYOrOXwDsRbyh7zyc99Law+TCFwtswftLkiUg+VVT0d9Ew A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="280596062" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="280596062" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="594927683" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2022 12:49:58 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9l022013; Tue, 28 Jun 2022 20:49:56 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 47/52] net, ice: build XDP generic metadata Date: Tue, 28 Jun 2022 21:48:07 +0200 Message-Id: <20220628194812.1453059-48-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Now that the driver builds skbs from an onstack generic meta structure, add the ability to configure the actual metadata format to be provided to BPF and XSK programs (and other consumers like cpumap). At first, it is being built on the stack and then synchronized with the buffer in front of a frame; and vice versa after the program returns back to the driver. In cases when meta is disabled or the frame size is below the threshold, the driver populates it only on %XDP_PASS and right before populating an skb, so no perf hits for that. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/ice/ice.h | 8 +++ drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++++- drivers/net/ethernet/intel/ice/ice_txrx.c | 25 ++++++--- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 53 +++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_xsk.c | 17 ++++-- 5 files changed, 107 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 402b71ab48e4..bd929bb1a359 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -490,6 +490,14 @@ enum ice_pf_flags { ICE_PF_FLAGS_NBITS /* must be last */ }; +enum { + ICE_MD_GENERIC, + + /* Must be last */ + ICE_MD_NONE, + __ICE_MD_NUM, +}; + struct ice_switchdev_info { struct ice_vsi *control_vsi; struct ice_vsi *uplink_vsi; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 7d049930a0a8..62bd0d316873 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -48,6 +48,11 @@ static DEFINE_IDA(ice_aux_ida); DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key); EXPORT_SYMBOL(ice_xdp_locking_key); +/* List of XDP metadata formats supported by the driver */ +static const char * const ice_supported_md[__ICE_MD_NUM] = { + [ICE_MD_GENERIC] = "struct xdp_meta_generic", +}; + /** * ice_hw_to_dev - Get device pointer from the hardware structure * @hw: pointer to the device HW structure @@ -2848,13 +2853,19 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct netdev_bpf *xdp) int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD; struct netlink_ext_ack *extack = xdp->extack; bool restart = false, prog = !!xdp->prog; - int ret = 0, xdp_ring_err = 0; + int pos, ret = 0, xdp_ring_err = 0; if (frame_size > vsi->rx_buf_len) { NL_SET_ERR_MSG_MOD(extack, "MTU too large for loading XDP"); return -EOPNOTSUPP; } + pos = xdp_meta_match_id(ice_supported_md, xdp->btf_id); + if (pos < 0) { + NL_SET_ERR_MSG_MOD(extack, "Invalid or unsupported BTF ID"); + return pos; + } + /* need to stop netdev while setting up the program for Rx rings */ if (ice_is_xdp_ena_vsi(vsi) != prog && netif_running(vsi->netdev) && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) { @@ -2867,6 +2878,9 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct netdev_bpf *xdp) restart = true; } + /* Paired with the READ_ONCE()'s in ice_clean_rx_irq{,_zc}() */ + WRITE_ONCE(vsi->xdp_info.drv_cookie, ICE_MD_NONE); + if (!ice_is_xdp_ena_vsi(vsi) && prog) { xdp_ring_err = ice_vsi_determine_xdp_res(vsi); if (xdp_ring_err) { @@ -2889,6 +2903,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct netdev_bpf *xdp) xdp_attachment_setup_rcu(&vsi->xdp_info, xdp); } + WRITE_ONCE(vsi->xdp_info.drv_cookie, pos); + if (restart) ret = ice_up(vsi); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index c679f7c30bdc..50de6d54e3b0 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1103,10 +1103,10 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) unsigned int total_rx_bytes = 0, total_rx_pkts = 0, frame_sz = 0; u16 cleaned_count = ICE_DESC_UNUSED(rx_ring); unsigned int offset = rx_ring->rx_offset; + struct xdp_attachment_info xdp_info; struct ice_tx_ring *xdp_ring = NULL; unsigned int xdp_res, xdp_xmit = 0; struct sk_buff *skb = rx_ring->skb; - struct bpf_prog *xdp_prog = NULL; struct xdp_buff xdp; bool failure; @@ -1116,9 +1116,16 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) #endif xdp_init_buff(&xdp, frame_sz, &rx_ring->xdp_rxq); - xdp_prog = rcu_dereference(rx_ring->xdp_info->prog_rcu); - if (xdp_prog) + xdp_info.prog = rcu_dereference(rx_ring->xdp_info->prog_rcu); + if (xdp_info.prog) { + const struct xdp_attachment_info *info = rx_ring->xdp_info; + + xdp_info.btf_id_le = cpu_to_le64(READ_ONCE(info->btf_id)); + xdp_info.meta_thresh = READ_ONCE(info->meta_thresh); + xdp_info.drv_cookie = READ_ONCE(info->drv_cookie); + xdp_ring = rx_ring->xdp_ring; + } /* start the loop to process Rx packets bounded by 'budget' */ while (likely(total_rx_pkts < (unsigned int)budget)) { @@ -1182,10 +1189,12 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size); #endif - if (!xdp_prog) + if (!xdp_info.prog) goto construct_skb; - xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring); + ice_xdp_handle_meta(&xdp, &md, &xdp_info, rx_desc, rx_ring); + + xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_info.prog, xdp_ring); if (!xdp_res) goto construct_skb; if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) { @@ -1240,8 +1249,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* probably a little skewed due to removing CRC */ total_rx_bytes += skb->len; - ice_xdp_build_meta(&md, rx_desc, rx_ring, 0); - __xdp_populate_skb_meta_generic(skb, &md); + ice_xdp_meta_populate_skb(skb, &md, xdp.data, rx_desc, + rx_ring); ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb); ice_receive_skb(rx_ring, skb); @@ -1254,7 +1263,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* return up to cleaned_count buffers to hardware */ failure = ice_alloc_rx_bufs(rx_ring, cleaned_count); - if (xdp_prog) + if (xdp_info.prog) ice_finalize_xdp_rx(xdp_ring, xdp_xmit); rx_ring->skb = skb; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index b51e58b8e83d..a9d3f3adf86b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -98,7 +98,60 @@ void __ice_xdp_build_meta(struct xdp_meta_generic_rx *rx_md, const struct ice_rx_ring *rx_ring, __le64 full_id); +static inline void +__ice_xdp_handle_meta(struct xdp_buff *xdp, struct xdp_meta_generic_rx *rx_md, + const struct xdp_attachment_info *info, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring) +{ + rx_md->rx_flags = 0; + + if (xdp->data_end - xdp->data < info->meta_thresh) + return; + + switch (info->drv_cookie) { + case ICE_MD_GENERIC: + __ice_xdp_build_meta(rx_md, rx_desc, rx_ring, info->btf_id_le); + + xdp->data_meta = xdp_meta_generic_ptr(xdp->data); + memcpy(to_rx_md(xdp->data_meta), rx_md, sizeof(*rx_md)); + + /* Just zero Tx flags instead of zeroing the whole part */ + to_gen_md(xdp->data_meta)->tx_flags = 0; + break; + default: + break; + } +} + +static inline void +__ice_xdp_meta_populate_skb(struct sk_buff *skb, + struct xdp_meta_generic_rx *rx_md, + const void *data, + const union ice_32b_rx_flex_desc *rx_desc, + const struct ice_rx_ring *rx_ring) +{ + /* __ice_xdp_build_meta() unconditionally sets Rx queue id. If it's + * not here, it means that metadata for this frame hasn't been built + * yet and we need to do this now. Otherwise, sync onstack metadata + * copy and mark meta as nocomp to ignore it on GRO layer. + */ + if (rx_md->rx_flags && likely(xdp_meta_has_generic(data))) { + memcpy(rx_md, to_rx_md(xdp_meta_generic_ptr(data)), + sizeof(*rx_md)); + skb_metadata_nocomp_set(skb); + } else { + __ice_xdp_build_meta(rx_md, rx_desc, rx_ring, 0); + } + + __xdp_populate_skb_meta_generic(skb, rx_md); +} + #define ice_xdp_build_meta(md, ...) \ __ice_xdp_build_meta(to_rx_md(md), ##__VA_ARGS__) +#define ice_xdp_handle_meta(xdp, md, ...) \ + __ice_xdp_handle_meta((xdp), to_rx_md(md), ##__VA_ARGS__) +#define ice_xdp_meta_populate_skb(skb, md, ...) \ + __ice_xdp_meta_populate_skb((skb), to_rx_md(md), ##__VA_ARGS__) #endif /* !_ICE_TXRX_LIB_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index eade918723eb..f5769f49e3c3 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -588,16 +588,20 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) { unsigned int total_rx_bytes = 0, total_rx_packets = 0; + const struct xdp_attachment_info *rxi = rx_ring->xdp_info, xdp_info = { + .prog = rcu_dereference(rxi->prog_rcu), + .btf_id_le = cpu_to_le64(READ_ONCE(rxi->btf_id)), + .meta_thresh = READ_ONCE(rxi->meta_thresh), + .drv_cookie = READ_ONCE(rxi->drv_cookie), + }; struct ice_tx_ring *xdp_ring; unsigned int xdp_xmit = 0; - struct bpf_prog *xdp_prog; bool failure = false; int entries_to_alloc; /* ZC patch is enabled only when XDP program is set, * so here it can not be NULL */ - xdp_prog = rcu_dereference(rx_ring->xdp_info->prog_rcu); xdp_ring = rx_ring->xdp_ring; while (likely(total_rx_packets < (unsigned int)budget)) { @@ -638,7 +642,10 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) xsk_buff_set_size(xdp, size); xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool); - xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring); + ice_xdp_handle_meta(xdp, &md, &xdp_info, rx_desc, rx_ring); + + xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_info.prog, + xdp_ring); if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { xdp_xmit |= xdp_res; } else if (xdp_res == ICE_XDP_EXIT) { @@ -674,8 +681,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) total_rx_bytes += skb->len; total_rx_packets++; - ice_xdp_build_meta(&md, rx_desc, rx_ring, 0); - __xdp_populate_skb_meta_generic(skb, &md); + ice_xdp_meta_populate_skb(skb, &md, xdp->data, rx_desc, + rx_ring); ice_receive_skb(rx_ring, skb); } From patchwork Tue Jun 28 19:48:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898867 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88966CCA47F for ; Tue, 28 Jun 2022 19:53:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232953AbiF1Txb (ORCPT ); Tue, 28 Jun 2022 15:53:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbiF1TvO (ORCPT ); Tue, 28 Jun 2022 15:51:14 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A53942E9FA; Tue, 28 Jun 2022 12:50:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445804; x=1687981804; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kf6pyiBAKh8PfjaMPwYDzTHelnIuBG2itheZfTrrJ9k=; b=Bt1WTvpGUKeNpAUYxZtPcTdoDmTu0ZXBBNQm4Qv4aefbgkaUFRUWOvJP YSeO50iCTD1kZWrGLBErJWhYczVJG+r8InAvlH7fa2i1e40h6kuAtWJu2 Ws/8TrYicsY6nviLg5Ipcc0GkQ7rh+yUbgIIfiPS35kXD6fAAi9fqdTjr i4ysNmBsUR2XMCBCibj/Nl+ghdmFuaWe30NZ0NGpbK/uvtbneDpFUGhgb nsslFJ3GIJejEMpQLJmCM5H3EObtL7AQ259xS1CAHN2vlYDh3Wv+21B3j MlDqFWowV+7R9erB/lIrD5y3Djh1t2ZkDc95oCTATlHaxw3STif9dnFfE w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="343523405" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="343523405" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="623054245" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga001.jf.intel.com with ESMTP; 28 Jun 2022 12:49:59 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9m022013; Tue, 28 Jun 2022 20:49:57 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 48/52] libbpf: compress Endianness ops with a macro Date: Tue, 28 Jun 2022 21:48:08 +0200 Message-Id: <20220628194812.1453059-49-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC All of the Endianness helpers for BPF programs have the same pattern and can be defined using a compression macro, which will also protect against typos and copy-paste mistakes. Not speaking of saving locs, of course. Ahh, if we only could define macros inside other macros. Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf_endian.h | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/tools/lib/bpf/bpf_endian.h b/tools/lib/bpf/bpf_endian.h index ec9db4feca9f..b03db6aa3f14 100644 --- a/tools/lib/bpf/bpf_endian.h +++ b/tools/lib/bpf/bpf_endian.h @@ -77,23 +77,15 @@ # error "Fix your compiler's __BYTE_ORDER__?!" #endif -#define bpf_htons(x) \ +#define __bpf_endop(op, x) \ (__builtin_constant_p(x) ? \ - __bpf_constant_htons(x) : __bpf_htons(x)) -#define bpf_ntohs(x) \ - (__builtin_constant_p(x) ? \ - __bpf_constant_ntohs(x) : __bpf_ntohs(x)) -#define bpf_htonl(x) \ - (__builtin_constant_p(x) ? \ - __bpf_constant_htonl(x) : __bpf_htonl(x)) -#define bpf_ntohl(x) \ - (__builtin_constant_p(x) ? \ - __bpf_constant_ntohl(x) : __bpf_ntohl(x)) -#define bpf_cpu_to_be64(x) \ - (__builtin_constant_p(x) ? \ - __bpf_constant_cpu_to_be64(x) : __bpf_cpu_to_be64(x)) -#define bpf_be64_to_cpu(x) \ - (__builtin_constant_p(x) ? \ - __bpf_constant_be64_to_cpu(x) : __bpf_be64_to_cpu(x)) + __bpf_constant_##op(x) : __bpf_##op(x)) + +#define bpf_htons(x) __bpf_endop(htons, x) +#define bpf_ntohs(x) __bpf_endop(ntohs, x) +#define bpf_htonl(x) __bpf_endop(htonl, x) +#define bpf_ntohl(x) __bpf_endop(ntohl, x) +#define bpf_cpu_to_be64(x) __bpf_endop(cpu_to_be64, x) +#define bpf_be64_to_cpu(x) __bpf_endop(be64_to_cpu, x) #endif /* __BPF_ENDIAN__ */ From patchwork Tue Jun 28 19:48:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898876 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA890C433EF for ; Tue, 28 Jun 2022 19:54:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233449AbiF1TyD (ORCPT ); Tue, 28 Jun 2022 15:54:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229698AbiF1TvO (ORCPT ); Tue, 28 Jun 2022 15:51:14 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B10352EA15; Tue, 28 Jun 2022 12:50:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445805; x=1687981805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t8ZlSI0m7zNuVag8i0YVuPoeiLfNpeZeGU0sNzKVBRo=; b=iDWCVmm1+5mIhFHPow04aog9KkjbtlJmryxRR61ivEMBKHC1L9ch8rvo jhwEY2OX1oaRvXRUwJOMYRkj/+X3FIXwOg5OardII5nH8s86KuK4w8NH4 Y2PtVplwyK3BEjuKZXFZgrHiMUkNrRqDZ13WzE0gCwmdiF3exkPJM2q8S fwP5hIGydO5zbak5EPn4NId1oNCLCEflXZNi5BHTAnS6nyhf3AR9bTqry WxiOwc3I4Ov41Af3XLrcZyyMviIAvDCy7OVEMuR8hCuD9Byl1hiKg0gr6 fMNHslDYYihqpeM3J8wAoB6vo9WoMqQjFVmmE0wd1uNIXeRHaQmGn66Qi w==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="261635779" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="261635779" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="732883570" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga001.fm.intel.com with ESMTP; 28 Jun 2022 12:50:01 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9n022013; Tue, 28 Jun 2022 20:49:59 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 49/52] libbpf: add LE <--> CPU conversion helpers Date: Tue, 28 Jun 2022 21:48:09 +0200 Message-Id: <20220628194812.1453059-50-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Larysa Zaremba XDP Generic metadata structure has fields of the explicit Endianness, all 16, 32 and 64-bit wide. To make it easier to access them, define __le{16,32,64} <--> cpu helpers the same way it's done for the BEs. Signed-off-by: Larysa Zaremba Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf_endian.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/tools/lib/bpf/bpf_endian.h b/tools/lib/bpf/bpf_endian.h index b03db6aa3f14..35941e6f1d99 100644 --- a/tools/lib/bpf/bpf_endian.h +++ b/tools/lib/bpf/bpf_endian.h @@ -60,6 +60,18 @@ # define __bpf_cpu_to_be64(x) __builtin_bswap64(x) # define __bpf_constant_be64_to_cpu(x) ___bpf_swab64(x) # define __bpf_constant_cpu_to_be64(x) ___bpf_swab64(x) +# define __bpf_le16_to_cpu(x) (x) +# define __bpf_cpu_to_le16(x) (x) +# define __bpf_constant_le16_to_cpu(x) (x) +# define __bpf_constant_cpu_to_le16(x) (x) +# define __bpf_le32_to_cpu(x) (x) +# define __bpf_cpu_to_le32(x) (x) +# define __bpf_constant_le32_to_cpu(x) (x) +# define __bpf_constant_cpu_to_le32(x) (x) +# define __bpf_le64_to_cpu(x) (x) +# define __bpf_cpu_to_le64(x) (x) +# define __bpf_constant_le64_to_cpu(x) (x) +# define __bpf_constant_cpu_to_le64(x) (x) #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ # define __bpf_ntohs(x) (x) # define __bpf_htons(x) (x) @@ -73,6 +85,18 @@ # define __bpf_cpu_to_be64(x) (x) # define __bpf_constant_be64_to_cpu(x) (x) # define __bpf_constant_cpu_to_be64(x) (x) +# define __bpf_le16_to_cpu(x) __builtin_bswap16(x) +# define __bpf_cpu_to_le16(x) __builtin_bswap16(x) +# define __bpf_constant_le16_to_cpu(x) ___bpf_swab16(x) +# define __bpf_constant_cpu_to_le16(x) ___bpf_swab16(x) +# define __bpf_le32_to_cpu(x) __builtin_bswap32(x) +# define __bpf_cpu_to_le32(x) __builtin_bswap32(x) +# define __bpf_constant_le32_to_cpu(x) ___bpf_swab32(x) +# define __bpf_constant_cpu_to_le32(x) ___bpf_swab32(x) +# define __bpf_le64_to_cpu(x) __builtin_bswap64(x) +# define __bpf_cpu_to_le64(x) __builtin_bswap64(x) +# define __bpf_constant_le64_to_cpu(x) ___bpf_swab64(x) +# define __bpf_constant_cpu_to_le64(x) ___bpf_swab64(x) #else # error "Fix your compiler's __BYTE_ORDER__?!" #endif @@ -87,5 +111,11 @@ #define bpf_ntohl(x) __bpf_endop(ntohl, x) #define bpf_cpu_to_be64(x) __bpf_endop(cpu_to_be64, x) #define bpf_be64_to_cpu(x) __bpf_endop(be64_to_cpu, x) +#define bpf_cpu_to_le16(x) __bpf_endop(cpu_to_le16, x) +#define bpf_le16_to_cpu(x) __bpf_endop(le16_to_cpu, x) +#define bpf_cpu_to_le32(x) __bpf_endop(cpu_to_le32, x) +#define bpf_le32_to_cpu(x) __bpf_endop(le32_to_cpu, x) +#define bpf_cpu_to_le64(x) __bpf_endop(cpu_to_le64, x) +#define bpf_le64_to_cpu(x) __bpf_endop(le64_to_cpu, x) #endif /* __BPF_ENDIAN__ */ From patchwork Tue Jun 28 19:48:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898875 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0577CCA479 for ; Tue, 28 Jun 2022 19:54:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233429AbiF1TyB (ORCPT ); Tue, 28 Jun 2022 15:54:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232784AbiF1TvW (ORCPT ); Tue, 28 Jun 2022 15:51:22 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 267F223BDD; Tue, 28 Jun 2022 12:50:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445807; x=1687981807; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eOg3zMPpNLqDqX1LA+Vo6P9BOiKuNyiBlLVUYpXAOWM=; b=kgxjR3sdwXfLJGLiu8tYjtt8oiQHgeKUfgwQlQBdPLpH06jYSeU4E66g i5LCuF8I4RsS0cpQqPX4wDeMJQfB7khCelCpFNdv5mwG287jKJFtU5IWQ CX435i+8A0IwBR+3gEW9rT4oWZbgBhudgaUsz2IyZGPPvlvRWIilMsO3G 0W2v9Ezba4hYyldSkUYu8GryNYW9dGb3zQhZWMFFuJtJ23IrZxWUL8Ssa 2ZbcICTArJFpiH2FV0PjwakymEjZVOYzEXoyUTI13z4Lmm0+qh6OGu6Ah FbQp7SNvRpc4UeVRIMSbKLtdBVKQQaPpnTBzCDXi5GGWtsAkV91mkwVG8 Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="261635781" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="261635781" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="658257636" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by fmsmga004.fm.intel.com with ESMTP; 28 Jun 2022 12:50:02 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9o022013; Tue, 28 Jun 2022 20:50:00 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 50/52] libbpf: introduce a couple memory access helpers Date: Tue, 28 Jun 2022 21:48:10 +0200 Message-Id: <20220628194812.1453059-51-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Larysa Zaremba In BPF programs, it is a common thing to declare that we're going to do a memory access via such snippet: if (data + ETH_HLEN > data_end) // bail out Offsets can be variable: if (VLAN_HLEN * vlan_count > SOME_ARBITRARY_MAX_OFFSET || ctx->data + VLAN_HLEN * vlan_count > data_end) // Or even calculated from the end: if (ctx->data_end - ctx->data - ETH_FCS_LEN > SOME_ARB_MAX_OFF || ctx->data_end - ETH_FCS_LEN < ctx->data) // As a bonus, LLVM sometimes has a hard time compiling sane C code the way that it would pass the in-kernel verifier. Add two new functions to sanitize memory accesses and get pointers to the requested ranges: one taking an offset from the start and one from the end (useful for metadata and different integrity check headers). They are written in Asm, so the offset can be variable and the code will pass the verifier. There are checks for the maximum offset (backed by the original verifier value), going out of bounds etc., so the pointer they return is ready to use (if it's non-%NULL). So now all is needed is: iphdr = bpf_access_mem(ctx->data, ctx->data_end, ETH_HLEN, sizeof(*iphdr)); if (!iphdr) // bail out or some_meta_struct = bpf_access_mem_end(ctx->data_meta, ctx->data, sizeof(*some_meta_struct), sizeof(*some_meta_struct)); if (!some_meta_struct) // The Asm code was happily stolen from the Cilium project repo[0] and then reworked. [0] https://github.com/cilium/cilium/blob/master/bpf/include/bpf/ctx/xdp.h#L43 Suggested-by: Daniel Borkmann # original helper Suggested-by: Toke Høiland-Jørgensen Signed-off-by: Larysa Zaremba Co-developed-by: Alexander Lobakin Signed-off-by: Alexander Lobakin --- tools/lib/bpf/bpf_helpers.h | 64 +++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index fb04eaf367f1..cd16e3c9cd85 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -285,4 +285,68 @@ enum libbpf_tristate { /* Helper macro to print out debug messages */ #define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args) +/* Max offset as per kernel verifier */ +#define MAX_PACKET_OFF 0xffff + +/** + * bpf_access_mem - sanitize memory access to a range + * @mem: start of the memory segment + * @mem_end: end of the memory segment + * @off: offset from the start of the memory segment + * @len: length of the range to give access to + * + * Verifies that the memory operations we want to perform are sane and within + * bounds and gives pointer to the requested range. The checks are done in Asm, + * so that it is safe to pass variable offset (verifier might reject such code + * written in plain C). + * The intended way of using it is as follows: + * + * iphdr = bpf_access_mem(ctx->data, ctx->data_end, ETH_HLEN, sizeof(*iphdr)); + * + * Returns pointer to the beginning of the range or %NULL. + */ +static __always_inline void * +bpf_access_mem(__u64 mem, __u64 mem_end, __u64 off, const __u64 len) +{ + void *ret; + + asm volatile("r1 = %[start]\n\t" + "r2 = %[end]\n\t" + "r3 = %[offmax] - %[len]\n\t" + "if %[off] > r3 goto +5\n\t" + "r1 += %[off]\n\t" + "%[ret] = r1\n\t" + "r1 += %[len]\n\t" + "if r1 > r2 goto +1\n\t" + "goto +1\n\t" + "%[ret] = %[null]\n\t" + : [ret]"=r"(ret) + : [start]"r"(mem), [end]"r"(mem_end), [off]"r"(off), + [len]"ri"(len), [offmax]"i"(MAX_PACKET_OFF), + [null]"i"(NULL) + : "r1", "r2", "r3"); + + return ret; +} + +/** + * bpf_access_mem_end - sanitize memory access to a range at the end of segment + * @mem: start of the memory segment + * @mem_end: end of the memory segment + * @offend: offset from the end of the memory segment + * @len: length of the range to give access to + * + * Version of bpf_access_mem() which performs all needed calculations to + * access a memory segment from the end. E.g., to access FCS (if provided): + * + * cp = bpf_access_mem_end(ctx->data, ctx->data_end, ETH_FCS_LEN, ETH_FCS_LEN); + * + * Returns pointer to the beginning of the range or %NULL. + */ +static __always_inline void * +bpf_access_mem_end(__u64 mem, __u64 mem_end, __u64 offend, const __u64 len) +{ + return bpf_access_mem(mem, mem_end, mem_end - mem - offend, len); +} + #endif From patchwork Tue Jun 28 19:48:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898877 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84E6FC43334 for ; Tue, 28 Jun 2022 19:54:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232292AbiF1TyI (ORCPT ); Tue, 28 Jun 2022 15:54:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232809AbiF1TvW (ORCPT ); Tue, 28 Jun 2022 15:51:22 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF8182F396; Tue, 28 Jun 2022 12:50:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445808; x=1687981808; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u+AC2MYbg8VVczXn+494HhsrP8hqR5Z3OTGl9vIateQ=; b=RUhS4kqJRnwUoQde8NR344wvzGJTF570Cg8XQHAm1y2eNjcCDy+W8BOu sVsr+CfqTl9sdVW0cafr9Z4Ju1AvTOV02J5dgsJKgzemfHzCotxhj0aIN Ff/9Yt9wk+CGvp8PKDNPukoIcu3zA2fAs6pgx87gAm0BppOdG6mtaW442 bMFufC1KlsuYPAE9/W7L40idiOEe63bg603AA2gksT6SNuWhtJIdIImul m4NHSCo9wc/cBIPaPx4rC+g9QPSpnMBXcuzn49VqgnrVT/2Fw7nIcZkBC eeluJO7POHgaMdfm/i92W0RNgd8e5RLwYrUOsS3TtGT8dVMubenVK9ioC A==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="279379122" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="279379122" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288265" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:50:03 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9p022013; Tue, 28 Jun 2022 20:50:01 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 51/52] selftests/bpf: fix using test_xdp_meta BPF prog via skeleton infra Date: Tue, 28 Jun 2022 21:48:11 +0200 Message-Id: <20220628194812.1453059-52-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC progs/test_xdp_meta works fine when loading via iproute2, but the skeleton infra can't load it, saying that the types of the BPF programs present in the binary are not set. This is due to that the convention is to place XDP progs in the section which named 'xdp' and TC BPF progs in the section 'tc', so do it here as well. Fixes: 22c8852624fc ("bpf: improve selftests and add tests for meta pointer") Signed-off-by: Alexander Lobakin --- tools/testing/selftests/bpf/progs/test_xdp_meta.c | 4 ++-- tools/testing/selftests/bpf/test_xdp_meta.sh | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c index a7c4a7d49fe6..fe2d71ae0e71 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c @@ -8,7 +8,7 @@ #define round_up(x, y) ((((x) - 1) | __round_mask(x, y)) + 1) #define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem -SEC("t") +SEC("tc") int ing_cls(struct __sk_buff *ctx) { __u8 *data, *data_meta, *data_end; @@ -28,7 +28,7 @@ int ing_cls(struct __sk_buff *ctx) return diff ? TC_ACT_SHOT : TC_ACT_OK; } -SEC("x") +SEC("xdp") int ing_xdp(struct xdp_md *ctx) { __u8 *data, *data_meta, *data_end; diff --git a/tools/testing/selftests/bpf/test_xdp_meta.sh b/tools/testing/selftests/bpf/test_xdp_meta.sh index ea69370caae3..7232714e89b3 100755 --- a/tools/testing/selftests/bpf/test_xdp_meta.sh +++ b/tools/testing/selftests/bpf/test_xdp_meta.sh @@ -42,11 +42,11 @@ ip netns exec ${NS2} ip addr add 10.1.1.22/24 dev veth2 ip netns exec ${NS1} tc qdisc add dev veth1 clsact ip netns exec ${NS2} tc qdisc add dev veth2 clsact -ip netns exec ${NS1} tc filter add dev veth1 ingress bpf da obj test_xdp_meta.o sec t -ip netns exec ${NS2} tc filter add dev veth2 ingress bpf da obj test_xdp_meta.o sec t +ip netns exec ${NS1} tc filter add dev veth1 ingress bpf da obj test_xdp_meta.o sec tc +ip netns exec ${NS2} tc filter add dev veth2 ingress bpf da obj test_xdp_meta.o sec tc -ip netns exec ${NS1} ip link set dev veth1 xdp obj test_xdp_meta.o sec x -ip netns exec ${NS2} ip link set dev veth2 xdp obj test_xdp_meta.o sec x +ip netns exec ${NS1} ip link set dev veth1 xdp obj test_xdp_meta.o sec xdp +ip netns exec ${NS2} ip link set dev veth2 xdp obj test_xdp_meta.o sec xdp ip netns exec ${NS1} ip link set dev veth1 up ip netns exec ${NS2} ip link set dev veth2 up From patchwork Tue Jun 28 19:48:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12898879 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F34CCCA47E for ; Tue, 28 Jun 2022 19:54:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233521AbiF1TyM (ORCPT ); Tue, 28 Jun 2022 15:54:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232814AbiF1TvW (ORCPT ); Tue, 28 Jun 2022 15:51:22 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47947255A1; Tue, 28 Jun 2022 12:50:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445811; x=1687981811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wbvvAWouY8kIt2EWeEUGtmZGTMgYa6RloKIfiZ8Lck0=; b=YK8G31fS+fBc/gZlcnuIZ+aEpXzBy4p/6KCe+MZ91lz8y1J70bJ6FDNn wUbk0LgylyB16QqpOKS9sGvKKgccuyNQxSsijlsKQTTn3dvjTQ/sQ97Qq euG7bCg6Y1yrzjq/LtxmwNsXqFUhub6eMC/eTUaiaHGQrDzHyo3/r0/Z3 yi2CvGc6IkNicFD/3yL++5E6bkb1wliihx0KPb+6wjGNVU0nx6JWPNHVh qjivwr8hxMWimSbEQ1wBnD5rberzz7U2gZh/F8blSDPcLgrajQJRHStcM v8sviiDgs9ow2KCpnkqRvVRdw4pTSggmyjbbWhBN+MSywLCtGTcalevpM g==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="345828556" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="345828556" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:50:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="623054308" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga001.jf.intel.com with ESMTP; 28 Jun 2022 12:50:05 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9q022013; Tue, 28 Jun 2022 20:50:03 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net Subject: [PATCH RFC bpf-next 52/52] selftests/bpf: add XDP Generic Hints selftest Date: Tue, 28 Jun 2022 21:48:12 +0200 Message-Id: <20220628194812.1453059-53-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a new BPF selftest which checks whether XDP Generic metadata works correctly using generic/skb XDP path. It is always available on any interface, so must always succeed. It uses special BPF program which works as follows: * tries to access metadata memory via bpf_access_mem_end(); * checks the frame size. For sizes < 128 bytes, drop packets with metadata present, so that we could check that setting the threshold works; * for sizes 128+, drop packets with no meta. Otherwise, check that it has correct magic and BTF ID matches with the one written by the verifier; * finally, pass packets with fully correct generic meta up the stack. And the test itself does the following: 1) attaches that XDP prog to veth interfaces with the threshold of 1, i.e. enable metadata generation for every packet; 2) ensures that the prog drops frames lesser than 128 bytes as intended (see above); 3) raises the threshold to 128 bytes (test updating the parameters without replacing the prog); 4) ensures that now no drops occur and that meta for frames >= 128 is valid. As it involves multiple userspace prog invocation, it performs BPF link pinning to make it freerunning. `ip netns exec` creates a new mount namespace (including sysfs) on each execution, the script now does a temporary persistent BPF FS mountpoint in the tests directory, so that pinned progs/links will be accessible across the launches. Co-developed-by: Larysa Zaremba Signed-off-by: Larysa Zaremba Signed-off-by: Alexander Lobakin --- tools/testing/selftests/bpf/.gitignore | 1 + tools/testing/selftests/bpf/Makefile | 4 +- .../selftests/bpf/progs/test_xdp_meta.c | 36 +++ tools/testing/selftests/bpf/test_xdp_meta.c | 294 ++++++++++++++++++ tools/testing/selftests/bpf/test_xdp_meta.sh | 51 +++ 5 files changed, 385 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/test_xdp_meta.c diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index ca2f47f45670..7d4de9d9002c 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -44,3 +44,4 @@ test_cpp xdpxceiver xdp_redirect_multi xdp_synproxy +/test_xdp_meta diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 4fbd88a8ed9e..aca8867deb8c 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -82,7 +82,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xdpxceiver xdp_redirect_multi xdp_synproxy + xdpxceiver xdp_redirect_multi xdp_synproxy test_xdp_meta TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read @@ -589,6 +589,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ +$(OUTPUT)/test_xdp_meta: | $(OUTPUT)/test_xdp_meta.skel.h + EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \ prog_tests/tests.h map_tests/tests.h verifier/tests.h \ feature bpftool \ diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c index fe2d71ae0e71..0b05d1c3979b 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c @@ -2,6 +2,8 @@ #include #include +#include +#include #include #define __round_mask(x, y) ((__typeof__(x))((y) - 1)) @@ -50,4 +52,38 @@ int ing_xdp(struct xdp_md *ctx) return XDP_PASS; } +#define TEST_META_THRESH 128 + +SEC("xdp") +int ing_hints(struct xdp_md *ctx) +{ + const struct xdp_meta_generic *md; + __le64 genid; + + md = bpf_access_mem_end(ctx->data_meta, ctx->data, sizeof(*md), + sizeof(*md)); + + /* Selftest enables metadata starting from 128 byte frame size, fail it + * if we receive a shorter frame with metadata + */ + if (ctx->data_end - ctx->data < TEST_META_THRESH) + return md ? XDP_DROP : XDP_PASS; + + if (!md) + return XDP_DROP; + + if (md->magic_id != bpf_cpu_to_le16(XDP_META_GENERIC_MAGIC)) + return XDP_DROP; + + genid = bpf_cpu_to_le64(bpf_core_type_id_kernel(typeof(*md))); + if (md->full_id != genid) + return XDP_DROP; + + /* Tx flags must be zeroed */ + if (md->tx_flags) + return XDP_DROP; + + return XDP_PASS; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_xdp_meta.c b/tools/testing/selftests/bpf/test_xdp_meta.c new file mode 100644 index 000000000000..e5c147d19190 --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_meta.c @@ -0,0 +1,294 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022, Intel Corporation. */ + +#define _GNU_SOURCE /* asprintf() */ + +#include +#include +#include +#include + +#include "test_xdp_meta.skel.h" + +struct test_meta_op_opts { + struct test_xdp_meta *skel; + const char *cmd; + char *path; + __u32 ifindex; + __u32 flags; + __u64 btf_id; + __u32 meta_thresh; +}; + +struct test_meta_opt_desc { + const char *arg; + const char *help; +}; + +#define OPT(n, a, s) { \ + .name = #n, \ + .has_arg = (a), \ + .val = #s[0], \ +} + +#define DESC(a, h) { \ + .arg = (a), \ + .help = (h), \ +} + +static const struct option test_meta_opts[] = { + OPT(dev, required_argument, d), + OPT(fs, required_argument, f), + OPT(help, no_argument, h), + OPT(meta-thresh, optional_argument, M), + OPT(mode, required_argument, m), + { /* Sentinel */ }, +}; + +static const struct test_meta_opt_desc test_meta_descs[] = { + DESC("= < IFNAME | IFINDEX >", "target interface name or index"), + DESC("= < MOUNTPOINT >", "BPF FS mountpoint"), + DESC(NULL, "display this text and exit"), + DESC("= [ THRESH ]", "enable Generic metadata generation (frame size)"), + DESC("= < skb | drv | hw >", "force particular XDP mode"), +}; + +static void test_meta_usage(char *argv[], bool err) +{ + FILE *out = err ? stderr : stdout; + __u32 i = 0; + + fprintf(out, + "Usage:\n\t%s COMMAND < -d | --dev= > < IFNAME | IFINDEX > [ OPTIONS ]\n\n", + argv[0]); + fprintf(out, "OPTIONS:\n"); + + for (const struct option *opt = test_meta_opts; opt->name; opt++) { + fprintf(out, "\t-%c, --%s", opt->val, opt->name); + fprintf(out, "%s\t", test_meta_descs[i].arg ? : "\t\t"); + fprintf(out, "%s\n", test_meta_descs[i++].help); + } +} + +static int test_meta_link_attach(const struct test_meta_op_opts *opts) +{ + LIBBPF_OPTS(bpf_xdp_attach_opts, la_opts, + .flags = opts->flags, + .btf_id = opts->btf_id, + .meta_thresh = opts->meta_thresh); + struct bpf_link *link; + int ret; + + link = bpf_program__attach_xdp_opts(opts->skel->progs.ing_hints, + opts->ifindex, &la_opts); + ret = libbpf_get_error(link); + if (ret) { + fprintf(stderr, "Failed to attach XDP program: %s (%d)\n", + strerror(-ret), ret); + return ret; + } + + opts->skel->links.ing_hints = link; + + ret = bpf_link__pin(link, opts->path); + if (ret) + fprintf(stderr, "Failed to pin XDP link at %s: %s (%d)\n", + opts->path, strerror(-ret), ret); + + bpf_link__disconnect(link); + + return ret; +} + +static int test_meta_link_update(const struct test_meta_op_opts *opts) +{ + LIBBPF_OPTS(bpf_link_update_opts, lu_opts, + .xdp.new_btf_id = opts->btf_id, + .xdp.new_meta_thresh = opts->meta_thresh); + struct bpf_link *link; + int ret; + + link = bpf_link__open(opts->path); + ret = libbpf_get_error(link); + if (ret) { + fprintf(stderr, "Failed to open XDP link at %s: %s (%d)\n", + opts->path, strerror(-ret), ret); + return ret; + } + + opts->skel->links.ing_hints = link; + + ret = bpf_link_update(bpf_link__fd(link), + bpf_program__fd(opts->skel->progs.ing_hints), + &lu_opts); + if (ret) + fprintf(stderr, "Failed to update XDP link: %s (%d)\n", + strerror(-ret), ret); + + return ret; +} + +static int test_meta_link_detach(const struct test_meta_op_opts *opts) +{ + struct bpf_link *link; + int ret; + + link = bpf_link__open(opts->path); + ret = libbpf_get_error(link); + if (ret) { + fprintf(stderr, "Failed to open XDP link at %s: %s (%d)\n", + opts->path, strerror(-ret), ret); + return ret; + } + + opts->skel->links.ing_hints = link; + + ret = bpf_link__unpin(link); + if (ret) { + fprintf(stderr, "Failed to unpin XDP link: %s (%d)\n", + strerror(-ret), ret); + return ret; + } + + ret = bpf_link__detach(link); + if (ret) + fprintf(stderr, "Failed to detach XDP link: %s (%d)\n", + strerror(-ret), ret); + + return ret; +} + +static int test_meta_parse_args(struct test_meta_op_opts *opts, int argc, + char *argv[]) +{ + int opt, longidx, ret; + + while (1) { + opt = getopt_long(argc, argv, "d:f:hM::m:", test_meta_opts, + &longidx); + if (opt < 0) + break; + + switch (opt) { + case 'd': + opts->ifindex = if_nametoindex(optarg); + if (!opts->ifindex) + opts->ifindex = strtoul(optarg, NULL, 0); + + break; + case 'f': + opts->path = optarg; + break; + case 'h': + test_meta_usage(argv, false); + return 0; + case 'M': + ret = libbpf_get_type_btf_id("struct xdp_meta_generic", + &opts->btf_id); + if (ret) { + fprintf(stderr, + "Failed to get BTF ID: %s (%d)\n", + strerror(-ret), ret); + return ret; + } + + /* Allow both `-M64` and `-M 64` */ + if (!optarg && optind < argc && argv[optind] && + *argv[optind] >= '0' && *argv[optind] <= '9') + optarg = argv[optind]; + + opts->meta_thresh = strtoul(optarg ? : "1", NULL, 0); + break; + case 'm': + if (!strcmp(optarg, "skb")) + opts->flags = XDP_FLAGS_SKB_MODE; + else if (!strcmp(optarg, "drv")) + opts->flags = XDP_FLAGS_DRV_MODE; + else if (!strcmp(optarg, "hw")) + opts->flags = XDP_FLAGS_HW_MODE; + + if (opts->flags) + break; + + /* fallthrough */ + default: + test_meta_usage(argv, true); + return -EINVAL; + } + } + + if (optind >= argc || !argv[optind]) { + fprintf(stderr, "Command is required\n"); + test_meta_usage(argv, true); + + return -EINVAL; + } + + opts->cmd = argv[optind]; + + return 0; +} + +int main(int argc, char *argv[]) +{ + struct test_meta_op_opts opts = { }; + int ret; + + libbpf_set_strict_mode(LIBBPF_STRICT_ALL); + + if (argc < 3) { + test_meta_usage(argv, true); + return -EINVAL; + } + + ret = test_meta_parse_args(&opts, argc, argv); + if (ret) + return ret; + + if (!opts.ifindex) { + fprintf(stderr, "Invalid or missing device argument\n"); + test_meta_usage(argv, true); + + return -EINVAL; + } + + opts.skel = test_xdp_meta__open_and_load(); + ret = libbpf_get_error(opts.skel); + if (ret) { + fprintf(stderr, "Failed to load test_xdp_meta skeleton: %s (%d)\n", + strerror(-ret), ret); + return ret; + } + + ret = asprintf(&opts.path, "%s/xdp/%s-%u", opts.path ? : "/sys/fs/bpf", + opts.skel->skeleton->name, opts.ifindex); + ret = ret < 0 ? -errno : 0; + if (ret) { + fprintf(stderr, "Failed to allocate path string: %s (%d)\n", + strerror(-ret), ret); + goto meta_destroy; + } + + if (!strcmp(opts.cmd, "attach")) { + ret = test_meta_link_attach(&opts); + } else if (!strcmp(opts.cmd, "update")) { + ret = test_meta_link_update(&opts); + } else if (!strcmp(opts.cmd, "detach")) { + ret = test_meta_link_detach(&opts); + } else { + fprintf(stderr, "Invalid command '%s'\n", opts.cmd); + test_meta_usage(argv, true); + + ret = -EINVAL; + } + + if (ret) + fprintf(stderr, "Failed to execute command '%s': %s (%d)\n", + opts.cmd, strerror(-ret), ret); + + free(opts.path); +meta_destroy: + test_xdp_meta__destroy(opts.skel); + + return ret; +} diff --git a/tools/testing/selftests/bpf/test_xdp_meta.sh b/tools/testing/selftests/bpf/test_xdp_meta.sh index 7232714e89b3..79c2ccb68dda 100755 --- a/tools/testing/selftests/bpf/test_xdp_meta.sh +++ b/tools/testing/selftests/bpf/test_xdp_meta.sh @@ -5,6 +5,11 @@ readonly KSFT_SKIP=4 readonly NS1="ns1-$(mktemp -u XXXXXX)" readonly NS2="ns2-$(mktemp -u XXXXXX)" +# We need a persistent BPF FS mointpoint. `ip netns exec` prepares a different +# temporary one on each invocation +readonly FS="$(mktemp -d XXXXXX)" +mount -t bpf bpffs ${FS} + cleanup() { if [ "$?" = "0" ]; then @@ -14,9 +19,16 @@ cleanup() fi set +e + + ip netns exec ${NS1} ./test_xdp_meta detach -d veth1 -f ${FS} -m skb 2> /dev/null + ip netns exec ${NS2} ./test_xdp_meta detach -d veth2 -f ${FS} -m skb 2> /dev/null + ip link del veth1 2> /dev/null ip netns del ${NS1} 2> /dev/null ip netns del ${NS2} 2> /dev/null + + umount ${FS} + rm -fr ${FS} } ip link set dev lo xdp off 2>/dev/null > /dev/null @@ -54,4 +66,43 @@ ip netns exec ${NS2} ip link set dev veth2 up ip netns exec ${NS1} ping -c 1 10.1.1.22 ip netns exec ${NS2} ping -c 1 10.1.1.11 +# +# Generic metadata part +# + +# Cleanup +ip netns exec ${NS1} ip link set dev veth1 xdp off +ip netns exec ${NS2} ip link set dev veth2 xdp off + +ip netns exec ${NS1} tc filter del dev veth1 ingress +ip netns exec ${NS2} tc filter del dev veth2 ingress + +# Enable metadata generation for every frame +ip netns exec ${NS1} ./test_xdp_meta attach -d veth1 -f ${FS} -m skb -M +ip netns exec ${NS2} ./test_xdp_meta attach -d veth2 -f ${FS} -m skb -M + +# Those two must fail: XDP prog drops packets < 128 bytes with metadata +set +e + +ip netns exec ${NS1} ping -c 1 10.1.1.22 -W 0.2 +if [ "$?" = "0" ]; then + exit 1 +fi +ip netns exec ${NS2} ping -c 1 10.1.1.11 -W 0.2 +if [ "$?" = "0" ]; then + exit 1 +fi + +set -e + +# Enable metadata only for frames >= 128 bytes +ip netns exec ${NS1} ./test_xdp_meta update -d veth1 -f ${FS} -m skb -M 128 +ip netns exec ${NS2} ./test_xdp_meta update -d veth2 -f ${FS} -m skb -M 128 + +# Must succeed +ip netns exec ${NS1} ping -c 1 10.1.1.22 +ip netns exec ${NS2} ping -c 1 10.1.1.11 +ip netns exec ${NS1} ping -c 1 10.1.1.22 -s 128 +ip netns exec ${NS2} ping -c 1 10.1.1.11 -s 128 + exit 0