From patchwork Mon May 22 23:29:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13251260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB12C7EE2D for ; Tue, 23 May 2023 00:24:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229702AbjEWAYp convert rfc822-to-8bit (ORCPT ); Mon, 22 May 2023 20:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235181AbjEVXaH (ORCPT ); Mon, 22 May 2023 19:30:07 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E49BE52 for ; Mon, 22 May 2023 16:29:31 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34MMkxIj019125 for ; Mon, 22 May 2023 16:29:30 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qpur5ydyj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 May 2023 16:29:30 -0700 Received: from twshared1631.42.prn1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 22 May 2023 16:29:29 -0700 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 1E396312BC06F; Mon, 22 May 2023 16:29:21 -0700 (PDT) From: Andrii Nakryiko To: , , , CC: , , , , Andrii Nakryiko Subject: [PATCH v3 bpf-next 1/4] bpf: validate BPF object in BPF_OBJ_PIN before calling LSM Date: Mon, 22 May 2023 16:29:14 -0700 Message-ID: <20230522232917.2454595-2-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230522232917.2454595-1-andrii@kernel.org> References: <20230522232917.2454595-1-andrii@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 9jQV-LL24upW9x6iZv5lbMLTWhKCulzy X-Proofpoint-ORIG-GUID: 9jQV-LL24upW9x6iZv5lbMLTWhKCulzy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-22_17,2023-05-22_03,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Do a sanity check whether provided file-to-be-pinned is actually a BPF object (prog, map, btf) before calling security_path_mknod LSM hook. If it's not, LSM hook doesn't have to be triggered, as the operation has no chance of succeeding anyways. Suggested-by: Christian Brauner Signed-off-by: Andrii Nakryiko Reviewed-by: Christian Brauner --- kernel/bpf/inode.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 9948b542a470..329f27d5cacf 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -448,18 +448,17 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw, if (IS_ERR(dentry)) return PTR_ERR(dentry); - mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()); - - ret = security_path_mknod(&path, dentry, mode, 0); - if (ret) - goto out; - dir = d_inode(path.dentry); if (dir->i_op != &bpf_dir_iops) { ret = -EPERM; goto out; } + mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()); + ret = security_path_mknod(&path, dentry, mode, 0); + if (ret) + goto out; + switch (type) { case BPF_TYPE_PROG: ret = vfs_mkobj(dentry, mode, bpf_mkprog, raw); From patchwork Mon May 22 23:29:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13251263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A7BC7EE2A for ; Tue, 23 May 2023 00:24:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232576AbjEWAYt convert rfc822-to-8bit (ORCPT ); Mon, 22 May 2023 20:24:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235179AbjEVXaH (ORCPT ); Mon, 22 May 2023 19:30:07 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4C7AE53 for ; Mon, 22 May 2023 16:29:34 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34MMkxIp019125 for ; Mon, 22 May 2023 16:29:34 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qpur5ydyj-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 May 2023 16:29:33 -0700 Received: from twshared1631.42.prn1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 22 May 2023 16:29:29 -0700 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 2A9E1312BC074; Mon, 22 May 2023 16:29:23 -0700 (PDT) From: Andrii Nakryiko To: , , , CC: , , , , Andrii Nakryiko Subject: [PATCH v3 bpf-next 2/4] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands Date: Mon, 22 May 2023 16:29:15 -0700 Message-ID: <20230522232917.2454595-3-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230522232917.2454595-1-andrii@kernel.org> References: <20230522232917.2454595-1-andrii@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: mMG8xR_PoL1tMlV50IPrpJxw6GBUy7yo X-Proofpoint-ORIG-GUID: mMG8xR_PoL1tMlV50IPrpJxw6GBUy7yo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-22_17,2023-05-22_03,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall forces users to specify pinning location as a string-based absolute or relative (to current working directory) path. This has various implications related to security (e.g., symlink-based attacks), forces BPF FS to be exposed in the file system, which can cause races with other applications. One of the feedbacks we got from folks working with containers heavily was that inability to use purely FD-based location specification was an unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET commands. This patch closes this oversight, adding path_fd field to BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by *at() syscalls for dirfd + pathname combinations. This now allows interesting possibilities like working with detached BPF FS mount (e.g., to perform multiple pinnings without running a risk of someone interfering with them), and generally making pinning/getting more secure and not prone to any races and/or security attacks. This is demonstrated by a selftest added in subsequent patch that takes advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate creating detached BPF FS mount, pinning, and then getting BPF map out of it, all while never exposing this private instance of BPF FS to outside worlds. Signed-off-by: Andrii Nakryiko Reviewed-by: Christian Brauner --- include/linux/bpf.h | 4 ++-- include/uapi/linux/bpf.h | 10 ++++++++++ kernel/bpf/inode.c | 16 ++++++++-------- kernel/bpf/syscall.c | 25 ++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 10 ++++++++++ 5 files changed, 50 insertions(+), 15 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 36e4b2d8cca2..f58895830ada 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd); struct bpf_link *bpf_link_get_from_fd(u32 ufd); struct bpf_link *bpf_link_get_curr_or_next(u32 *id); -int bpf_obj_pin_user(u32 ufd, const char __user *pathname); -int bpf_obj_get_user(const char __user *pathname, int flags); +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname); +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags); #define BPF_ITER_FUNC_PREFIX "bpf_iter_" #define DEFINE_BPF_ITER_FUNC(target, args...) \ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1bb11a6ee667..9273c654743c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1272,6 +1272,9 @@ enum { /* Create a map that will be registered/unregesitered by the backed bpf_link */ BPF_F_LINK = (1U << 13), + +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ + BPF_F_PATH_FD = (1U << 14), }; /* Flags for BPF_PROG_QUERY. */ @@ -1420,6 +1423,13 @@ union bpf_attr { __aligned_u64 pathname; __u32 bpf_fd; __u32 file_flags; + /* Same as dirfd in openat() syscall; see openat(2) + * manpage for details of path FD and pathname semantics; + * path_fd should accompanied by BPF_F_PATH_FD flag set in + * file_flags field, otherwise it should be set to zero; + * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed. + */ + __s32 path_fd; }; struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 329f27d5cacf..4174f76133df 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -435,7 +435,7 @@ static int bpf_iter_link_pin_kernel(struct dentry *parent, return ret; } -static int bpf_obj_do_pin(const char __user *pathname, void *raw, +static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw, enum bpf_type type) { struct dentry *dentry; @@ -444,7 +444,7 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw, umode_t mode; int ret; - dentry = user_path_create(AT_FDCWD, pathname, &path, 0); + dentry = user_path_create(path_fd, pathname, &path, 0); if (IS_ERR(dentry)) return PTR_ERR(dentry); @@ -477,7 +477,7 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw, return ret; } -int bpf_obj_pin_user(u32 ufd, const char __user *pathname) +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname) { enum bpf_type type; void *raw; @@ -487,14 +487,14 @@ int bpf_obj_pin_user(u32 ufd, const char __user *pathname) if (IS_ERR(raw)) return PTR_ERR(raw); - ret = bpf_obj_do_pin(pathname, raw, type); + ret = bpf_obj_do_pin(path_fd, pathname, raw, type); if (ret != 0) bpf_any_put(raw, type); return ret; } -static void *bpf_obj_do_get(const char __user *pathname, +static void *bpf_obj_do_get(int path_fd, const char __user *pathname, enum bpf_type *type, int flags) { struct inode *inode; @@ -502,7 +502,7 @@ static void *bpf_obj_do_get(const char __user *pathname, void *raw; int ret; - ret = user_path_at(AT_FDCWD, pathname, LOOKUP_FOLLOW, &path); + ret = user_path_at(path_fd, pathname, LOOKUP_FOLLOW, &path); if (ret) return ERR_PTR(ret); @@ -526,7 +526,7 @@ static void *bpf_obj_do_get(const char __user *pathname, return ERR_PTR(ret); } -int bpf_obj_get_user(const char __user *pathname, int flags) +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags) { enum bpf_type type = BPF_TYPE_UNSPEC; int f_flags; @@ -537,7 +537,7 @@ int bpf_obj_get_user(const char __user *pathname, int flags) if (f_flags < 0) return f_flags; - raw = bpf_obj_do_get(pathname, &type, f_flags); + raw = bpf_obj_do_get(path_fd, pathname, &type, f_flags); if (IS_ERR(raw)) return PTR_ERR(raw); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b2621089904b..c7f6807215e6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2697,23 +2697,38 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) return err; } -#define BPF_OBJ_LAST_FIELD file_flags +#define BPF_OBJ_LAST_FIELD path_fd static int bpf_obj_pin(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0) + int path_fd; + + if (CHECK_ATTR(BPF_OBJ) || attr->file_flags & ~BPF_F_PATH_FD) + return -EINVAL; + + /* path_fd has to be accompanied by BPF_F_PATH_FD flag */ + if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd) return -EINVAL; - return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname)); + path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD; + return bpf_obj_pin_user(attr->bpf_fd, path_fd, + u64_to_user_ptr(attr->pathname)); } static int bpf_obj_get(const union bpf_attr *attr) { + int path_fd; + if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 || - attr->file_flags & ~BPF_OBJ_FLAG_MASK) + attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD)) + return -EINVAL; + + /* path_fd has to be accompanied by BPF_F_PATH_FD flag */ + if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd) return -EINVAL; - return bpf_obj_get_user(u64_to_user_ptr(attr->pathname), + path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD; + return bpf_obj_get_user(path_fd, u64_to_user_ptr(attr->pathname), attr->file_flags); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1bb11a6ee667..9273c654743c 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1272,6 +1272,9 @@ enum { /* Create a map that will be registered/unregesitered by the backed bpf_link */ BPF_F_LINK = (1U << 13), + +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ + BPF_F_PATH_FD = (1U << 14), }; /* Flags for BPF_PROG_QUERY. */ @@ -1420,6 +1423,13 @@ union bpf_attr { __aligned_u64 pathname; __u32 bpf_fd; __u32 file_flags; + /* Same as dirfd in openat() syscall; see openat(2) + * manpage for details of path FD and pathname semantics; + * path_fd should accompanied by BPF_F_PATH_FD flag set in + * file_flags field, otherwise it should be set to zero; + * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed. + */ + __s32 path_fd; }; struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ From patchwork Mon May 22 23:29:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13251264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB220C77B73 for ; Tue, 23 May 2023 00:24:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234051AbjEWAY4 convert rfc822-to-8bit (ORCPT ); Mon, 22 May 2023 20:24:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235180AbjEVXaH (ORCPT ); Mon, 22 May 2023 19:30:07 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AEE2E56 for ; Mon, 22 May 2023 16:29:34 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34MMlnCm013243 for ; Mon, 22 May 2023 16:29:34 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qpuwqpbpv-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 May 2023 16:29:33 -0700 Received: from twshared25760.37.frc1.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 22 May 2023 16:29:27 -0700 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 49418312BC08B; Mon, 22 May 2023 16:29:25 -0700 (PDT) From: Andrii Nakryiko To: , , , CC: , , , , Andrii Nakryiko Subject: [PATCH v3 bpf-next 3/4] libbpf: add opts-based bpf_obj_pin() API and add support for path_fd Date: Mon, 22 May 2023 16:29:16 -0700 Message-ID: <20230522232917.2454595-4-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230522232917.2454595-1-andrii@kernel.org> References: <20230522232917.2454595-1-andrii@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Mm1Hg_GB24RdsbvK71GB7AeDssW9UXMN X-Proofpoint-GUID: Mm1Hg_GB24RdsbvK71GB7AeDssW9UXMN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-22_17,2023-05-22_03,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add path_fd support for bpf_obj_pin() and bpf_obj_get() operations (through their opts-based variants). This allows to take advantage of new kernel-side support for O_PATH-based pin/get location specification. Signed-off-by: Andrii Nakryiko --- tools/lib/bpf/bpf.c | 17 ++++++++++++++--- tools/lib/bpf/bpf.h | 18 ++++++++++++++++-- tools/lib/bpf/libbpf.map | 1 + 3 files changed, 31 insertions(+), 5 deletions(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 128ac723c4ea..ed86b37d8024 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -572,20 +572,30 @@ int bpf_map_update_batch(int fd, const void *keys, const void *values, __u32 *co (void *)keys, (void *)values, count, opts); } -int bpf_obj_pin(int fd, const char *pathname) +int bpf_obj_pin_opts(int fd, const char *pathname, const struct bpf_obj_pin_opts *opts) { - const size_t attr_sz = offsetofend(union bpf_attr, file_flags); + const size_t attr_sz = offsetofend(union bpf_attr, path_fd); union bpf_attr attr; int ret; + if (!OPTS_VALID(opts, bpf_obj_pin_opts)) + return libbpf_err(-EINVAL); + memset(&attr, 0, attr_sz); + attr.path_fd = OPTS_GET(opts, path_fd, 0); attr.pathname = ptr_to_u64((void *)pathname); + attr.file_flags = OPTS_GET(opts, file_flags, 0); attr.bpf_fd = fd; ret = sys_bpf(BPF_OBJ_PIN, &attr, attr_sz); return libbpf_err_errno(ret); } +int bpf_obj_pin(int fd, const char *pathname) +{ + return bpf_obj_pin_opts(fd, pathname, NULL); +} + int bpf_obj_get(const char *pathname) { return bpf_obj_get_opts(pathname, NULL); @@ -593,7 +603,7 @@ int bpf_obj_get(const char *pathname) int bpf_obj_get_opts(const char *pathname, const struct bpf_obj_get_opts *opts) { - const size_t attr_sz = offsetofend(union bpf_attr, file_flags); + const size_t attr_sz = offsetofend(union bpf_attr, path_fd); union bpf_attr attr; int fd; @@ -601,6 +611,7 @@ int bpf_obj_get_opts(const char *pathname, const struct bpf_obj_get_opts *opts) return libbpf_err(-EINVAL); memset(&attr, 0, attr_sz); + attr.path_fd = OPTS_GET(opts, path_fd, 0); attr.pathname = ptr_to_u64((void *)pathname); attr.file_flags = OPTS_GET(opts, file_flags, 0); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index a2c091389b18..9aa0ee473754 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -284,16 +284,30 @@ LIBBPF_API int bpf_map_update_batch(int fd, const void *keys, const void *values __u32 *count, const struct bpf_map_batch_opts *opts); -struct bpf_obj_get_opts { +struct bpf_obj_pin_opts { size_t sz; /* size of this struct for forward/backward compatibility */ __u32 file_flags; + int path_fd; size_t :0; }; -#define bpf_obj_get_opts__last_field file_flags +#define bpf_obj_pin_opts__last_field path_fd LIBBPF_API int bpf_obj_pin(int fd, const char *pathname); +LIBBPF_API int bpf_obj_pin_opts(int fd, const char *pathname, + const struct bpf_obj_pin_opts *opts); + +struct bpf_obj_get_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + + __u32 file_flags; + int path_fd; + + size_t :0; +}; +#define bpf_obj_get_opts__last_field path_fd + LIBBPF_API int bpf_obj_get(const char *pathname); LIBBPF_API int bpf_obj_get_opts(const char *pathname, const struct bpf_obj_get_opts *opts); diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index a5aa3a383d69..7a4fe80da360 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -389,5 +389,6 @@ LIBBPF_1.2.0 { bpf_link__update_map; bpf_link_get_info_by_fd; bpf_map_get_info_by_fd; + bpf_obj_pin_opts; bpf_prog_get_info_by_fd; } LIBBPF_1.1.0; From patchwork Mon May 22 23:29:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13251298 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87BB4C77B73 for ; Tue, 23 May 2023 00:41:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232043AbjEWAlJ convert rfc822-to-8bit (ORCPT ); Mon, 22 May 2023 20:41:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232488AbjEVXe2 (ORCPT ); Mon, 22 May 2023 19:34:28 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CC5F118 for ; Mon, 22 May 2023 16:32:16 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34MMkIjL015313 for ; Mon, 22 May 2023 16:32:15 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qpuy0qjg2-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 May 2023 16:32:15 -0700 Received: from twshared58712.02.prn6.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 22 May 2023 16:32:13 -0700 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 55A77312BC0C1; Mon, 22 May 2023 16:29:27 -0700 (PDT) From: Andrii Nakryiko To: , , , CC: , , , , Andrii Nakryiko Subject: [PATCH v3 bpf-next 4/4] selftests/bpf: add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests Date: Mon, 22 May 2023 16:29:17 -0700 Message-ID: <20230522232917.2454595-5-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230522232917.2454595-1-andrii@kernel.org> References: <20230522232917.2454595-1-andrii@kernel.org> X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: D_dyvWX-0fB4mJnmMzg3r5iQTkCdqmPK X-Proofpoint-GUID: D_dyvWX-0fB4mJnmMzg3r5iQTkCdqmPK X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-22_17,2023-05-22_03,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a selftest demonstrating using detach-mounted BPF FS using new mount APIs, and pinning and getting BPF map using such mount. This demonstrates how something like container manager could setup BPF FS, pin and adjust all the necessary objects in it, all before exposing BPF FS to a particular mount namespace. Also add a few subtests validating all meaningful combinations of path_fd and pathname. We use mounted /sys/fs/bpf location for these. Signed-off-by: Andrii Nakryiko --- .../bpf/prog_tests/bpf_obj_pinning.c | 268 ++++++++++++++++++ 1 file changed, 268 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c new file mode 100644 index 000000000000..31f1e815f671 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +static inline int sys_fsopen(const char *fsname, unsigned flags) +{ + return syscall(__NR_fsopen, fsname, flags); +} + +static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux) +{ + return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux); +} + +static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags) +{ + return syscall(__NR_fsmount, fs_fd, flags, ms_flags); +} + +__attribute__((unused)) +static inline int sys_move_mount(int from_dfd, const char *from_path, + int to_dfd, const char *to_path, + unsigned int ms_flags) +{ + return syscall(__NR_move_mount, from_dfd, from_path, to_dfd, to_path, ms_flags); +} + +static void bpf_obj_pinning_detached(void) +{ + LIBBPF_OPTS(bpf_obj_pin_opts, pin_opts); + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + int fs_fd = -1, mnt_fd = -1; + int map_fd = -1, map_fd2 = -1; + int zero = 0, src_value, dst_value, err; + const char *map_name = "fsmount_map"; + + /* A bunch of below UAPI calls are constructed based on reading: + * https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html + */ + + /* create VFS context */ + fs_fd = sys_fsopen("bpf", 0); + if (!ASSERT_GE(fs_fd, 0, "fs_fd")) + goto cleanup; + + /* instantiate FS object */ + err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); + if (!ASSERT_OK(err, "fs_create")) + goto cleanup; + + /* create O_PATH fd for detached mount */ + mnt_fd = sys_fsmount(fs_fd, 0, 0); + if (!ASSERT_GE(mnt_fd, 0, "mnt_fd")) + goto cleanup; + + /* If we wanted to expose detached mount in the file system, we'd do + * something like below. But the whole point is that we actually don't + * even have to expose BPF FS in the file system to be able to work + * (pin/get objects) with it. + * + * err = sys_move_mount(mnt_fd, "", -EBADF, mnt_path, MOVE_MOUNT_F_EMPTY_PATH); + * if (!ASSERT_OK(err, "move_mount")) + * goto cleanup; + */ + + /* create BPF map to pin */ + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, map_name, 4, 4, 1, NULL); + if (!ASSERT_GE(map_fd, 0, "map_fd")) + goto cleanup; + + /* pin BPF map into detached BPF FS through mnt_fd */ + pin_opts.file_flags = BPF_F_PATH_FD; + pin_opts.path_fd = mnt_fd; + err = bpf_obj_pin_opts(map_fd, map_name, &pin_opts); + if (!ASSERT_OK(err, "map_pin")) + goto cleanup; + + /* get BPF map from detached BPF FS through mnt_fd */ + get_opts.file_flags = BPF_F_PATH_FD; + get_opts.path_fd = mnt_fd; + map_fd2 = bpf_obj_get_opts(map_name, &get_opts); + if (!ASSERT_GE(map_fd2, 0, "map_get")) + goto cleanup; + + /* update map through one FD */ + src_value = 0xcafebeef; + err = bpf_map_update_elem(map_fd, &zero, &src_value, 0); + ASSERT_OK(err, "map_update"); + + /* check values written/read through different FDs do match */ + dst_value = 0; + err = bpf_map_lookup_elem(map_fd2, &zero, &dst_value); + ASSERT_OK(err, "map_lookup"); + ASSERT_EQ(dst_value, src_value, "map_value_eq1"); + ASSERT_EQ(dst_value, 0xcafebeef, "map_value_eq2"); + +cleanup: + if (map_fd >= 0) + ASSERT_OK(close(map_fd), "close_map_fd"); + if (map_fd2 >= 0) + ASSERT_OK(close(map_fd2), "close_map_fd2"); + if (fs_fd >= 0) + ASSERT_OK(close(fs_fd), "close_fs_fd"); + if (mnt_fd >= 0) + ASSERT_OK(close(mnt_fd), "close_mnt_fd"); +} + +enum path_kind +{ + PATH_STR_ABS, + PATH_STR_REL, + PATH_FD_REL, +}; + +static void validate_pin(int map_fd, const char *map_name, int src_value, + enum path_kind path_kind) +{ + LIBBPF_OPTS(bpf_obj_pin_opts, pin_opts); + char abs_path[PATH_MAX], old_cwd[PATH_MAX]; + const char *pin_path = NULL; + int zero = 0, dst_value, map_fd2, err; + + snprintf(abs_path, sizeof(abs_path), "/sys/fs/bpf/%s", map_name); + old_cwd[0] = '\0'; + + switch (path_kind) { + case PATH_STR_ABS: + /* absolute path */ + pin_path = abs_path; + break; + case PATH_STR_REL: + /* cwd + relative path */ + ASSERT_OK_PTR(getcwd(old_cwd, sizeof(old_cwd)), "getcwd"); + ASSERT_OK(chdir("/sys/fs/bpf"), "chdir"); + pin_path = map_name; + break; + case PATH_FD_REL: + /* dir fd + relative path */ + pin_opts.file_flags = BPF_F_PATH_FD; + pin_opts.path_fd = open("/sys/fs/bpf", O_PATH); + ASSERT_GE(pin_opts.path_fd, 0, "path_fd"); + pin_path = map_name; + break; + } + + /* pin BPF map using specified path definition */ + err = bpf_obj_pin_opts(map_fd, pin_path, &pin_opts); + ASSERT_OK(err, "obj_pin"); + + /* cleanup */ + if (pin_opts.path_fd >= 0) + close(pin_opts.path_fd); + if (old_cwd[0]) + ASSERT_OK(chdir(old_cwd), "restore_cwd"); + + map_fd2 = bpf_obj_get(abs_path); + if (!ASSERT_GE(map_fd2, 0, "map_get")) + goto cleanup; + + /* update map through one FD */ + err = bpf_map_update_elem(map_fd, &zero, &src_value, 0); + ASSERT_OK(err, "map_update"); + + /* check values written/read through different FDs do match */ + dst_value = 0; + err = bpf_map_lookup_elem(map_fd2, &zero, &dst_value); + ASSERT_OK(err, "map_lookup"); + ASSERT_EQ(dst_value, src_value, "map_value_eq"); +cleanup: + if (map_fd2 >= 0) + ASSERT_OK(close(map_fd2), "close_map_fd2"); + unlink(abs_path); +} + +static void validate_get(int map_fd, const char *map_name, int src_value, + enum path_kind path_kind) +{ + LIBBPF_OPTS(bpf_obj_get_opts, get_opts); + char abs_path[PATH_MAX], old_cwd[PATH_MAX]; + const char *pin_path = NULL; + int zero = 0, dst_value, map_fd2, err; + + snprintf(abs_path, sizeof(abs_path), "/sys/fs/bpf/%s", map_name); + /* pin BPF map using specified path definition */ + err = bpf_obj_pin(map_fd, abs_path); + if (!ASSERT_OK(err, "pin_map")) + return; + + old_cwd[0] = '\0'; + + switch (path_kind) { + case PATH_STR_ABS: + /* absolute path */ + pin_path = abs_path; + break; + case PATH_STR_REL: + /* cwd + relative path */ + ASSERT_OK_PTR(getcwd(old_cwd, sizeof(old_cwd)), "getcwd"); + ASSERT_OK(chdir("/sys/fs/bpf"), "chdir"); + pin_path = map_name; + break; + case PATH_FD_REL: + /* dir fd + relative path */ + get_opts.file_flags = BPF_F_PATH_FD; + get_opts.path_fd = open("/sys/fs/bpf", O_PATH); + ASSERT_GE(get_opts.path_fd, 0, "path_fd"); + pin_path = map_name; + break; + } + + map_fd2 = bpf_obj_get_opts(pin_path, &get_opts); + if (!ASSERT_GE(map_fd2, 0, "map_get")) + goto cleanup; + + /* cleanup */ + if (get_opts.path_fd >= 0) + close(get_opts.path_fd); + if (old_cwd[0]) + ASSERT_OK(chdir(old_cwd), "restore_cwd"); + + /* update map through one FD */ + err = bpf_map_update_elem(map_fd, &zero, &src_value, 0); + ASSERT_OK(err, "map_update"); + + /* check values written/read through different FDs do match */ + dst_value = 0; + err = bpf_map_lookup_elem(map_fd2, &zero, &dst_value); + ASSERT_OK(err, "map_lookup"); + ASSERT_EQ(dst_value, src_value, "map_value_eq"); +cleanup: + if (map_fd2 >= 0) + ASSERT_OK(close(map_fd2), "close_map_fd2"); + unlink(abs_path); +} + +static void bpf_obj_pinning_mounted(enum path_kind path_kind) +{ + const char *map_name = "mounted_map"; + int map_fd; + + /* create BPF map to pin */ + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, map_name, 4, 4, 1, NULL); + if (!ASSERT_GE(map_fd, 0, "map_fd")) + return; + + validate_pin(map_fd, map_name, 100 + (int)path_kind, path_kind); + validate_get(map_fd, map_name, 200 + (int)path_kind, path_kind); + ASSERT_OK(close(map_fd), "close_map_fd"); +} + +void test_bpf_obj_pinning() +{ + if (test__start_subtest("detached")) + bpf_obj_pinning_detached(); + if (test__start_subtest("mounted-str-abs")) + bpf_obj_pinning_mounted(PATH_STR_ABS); + if (test__start_subtest("mounted-str-rel")) + bpf_obj_pinning_mounted(PATH_STR_REL); + if (test__start_subtest("mounted-fd-rel")) + bpf_obj_pinning_mounted(PATH_FD_REL); +}