From patchwork Sun Mar 26 09:21:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188009 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C47DCC74A5B for ; Sun, 26 Mar 2023 09:22:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230286AbjCZJWU (ORCPT ); Sun, 26 Mar 2023 05:22:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229596AbjCZJWT (ORCPT ); Sun, 26 Mar 2023 05:22:19 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D464540EE; Sun, 26 Mar 2023 02:22:18 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id qh28so4870902qvb.7; Sun, 26 Mar 2023 02:22:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TwHHUn2KJ1qVHrX1VAd8snE/lpVIPkRNAfXb0ctK8EA=; b=VmnNzVvbOE2xFNfHTtV8ssb2ax9uq2cUpvbAsBKyuOhvL/pKhQHH5D1N1paKdpk2pZ OrYkUh/mTYLLSE/Xc5KjyO1ELmj2oWzNAaiZCEJv3U6CYvBEwJ3iMsy20Z8r4pT/Hd8B 57KvlQz2ZAgauzaPME5YjeGKXdwLsAyK6i+rPGxCfOGdfuPqAfPQ4xootiof17ZxCdDE JVq+cfd0qguE9m5enVYLoD2qHmmeS081/qldllbut0vVaOsjHuKBvafUAL+zVdvoEpOY k55CWTKkJZ+BxKWDCZVTDVX8LwvomU+0kHNxuCrtbj5j36Np1AUgHHyUQxMeRpBxmykK efzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TwHHUn2KJ1qVHrX1VAd8snE/lpVIPkRNAfXb0ctK8EA=; b=kxhx5/QutIxIiaCHeB6iif09i7bicAhwIVnQSU6dufEMkV9Y0dlnIjrWYiGkmi7tS6 oIyhXRmoKhkrMqhCS9GHNBv2n0TNJqv7Aza0n+nqmcwxDimSeldoR9vUmntVF98HF6BL QCF2VEWL2J3WoIXw0/1ELOocmtPvm/dMmY97B01/MfzsJFbnQLIqmKu/dcq3Kv38QxTD fZbsK6D1CrrGI96xieMLULqPsjUzBpLh0eazlUgBbAdkjLLURjvIGRrqNHQ0OS5uRq4O R52n1LMY0wbg05QtRBVWTl58YpWwi0JQbaC6gzOSsLeIkUk9HERvBF2jckdaNxErhpSx Peqw== X-Gm-Message-State: AAQBX9dJ0/8u59H0OYEwL3zAcNkVw1LV9FmCCJcMtFaLFyv2Zu2d5KF7 N+50vV/3NNjfnuVSJ5e64Ek= X-Google-Smtp-Source: AKy350bM+eTpjT6YC/4vhOCRdBwtCCa0Ktv98B2UHSYV2R+F2nF83kYyHhD6zO1xGb9E6pY3wmDMZA== X-Received: by 2002:a05:6214:1c45:b0:5bd:14f9:650f with SMTP id if5-20020a0562141c4500b005bd14f9650fmr14305482qvb.45.1679822537938; Sun, 26 Mar 2023 02:22:17 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:17 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 01/13] fork: New clone3 flag for BPF namespace Date: Sun, 26 Mar 2023 09:21:56 +0000 Message-Id: <20230326092208.13613-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC A new clone3 flag CLONE_NEWBPF is introduced to create a new BPF namespace. Signed-off-by: Yafang Shao --- include/uapi/linux/sched.h | 1 + kernel/fork.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 3bac0a8..ace31df 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -36,6 +36,7 @@ /* Flags for the clone3() syscall. */ #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */ +#define CLONE_NEWBPF 0x400000000ULL /* New BPF namespace */ /* * cloning flags intersect with CSIGNAL so can be used with unshare and clone3 diff --git a/kernel/fork.c b/kernel/fork.c index f68954d..db0abd4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2929,7 +2929,8 @@ static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* Verify that no unknown flags are passed along. */ if (kargs->flags & - ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP)) + ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP | + CLONE_NEWBPF)) return false; /* @@ -3080,7 +3081,7 @@ static int check_unshare_flags(unsigned long unshare_flags) CLONE_VM|CLONE_FILES|CLONE_SYSVSEM| CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET| CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP| - CLONE_NEWTIME)) + CLONE_NEWTIME|CLONE_NEWBPF)) return -EINVAL; /* * Not implemented, but pretend it works if there is nothing From patchwork Sun Mar 26 09:21:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188011 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9032AC77B62 for ; Sun, 26 Mar 2023 09:22:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231833AbjCZJWX (ORCPT ); Sun, 26 Mar 2023 05:22:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230192AbjCZJWU (ORCPT ); Sun, 26 Mar 2023 05:22:20 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8F099032; Sun, 26 Mar 2023 02:22:19 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id bz27so5883656qtb.1; Sun, 26 Mar 2023 02:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822539; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fqlnazOt7kJ3822TP+aL3SiF//u+HTRXad+baVnErHY=; b=ULRmL+goUMzVI5YM5Y3Tyb6Q3Ydk4wgw8+6UtQFRH6EdEXFLldFizOc9kXUV0CRl/o OwLu+mZCMJ7VUWMBNtO9bEU3nRuILHZtYR6BYOzE8dPY2AUmKxv9/8WJEF138QpnKxsa +kAu7TUP7430uq1suJArrJQU+HQ+6vvvKQhK6HCCsAovvHrMlr9FG0Xchxfk5jHwV+q/ zmPpsOwGDOG8odIJR86LI5L++nbeSpoPDyXGCHaykARBdbfNBazeOlLoirujo51KSOmN ROb48wG0E2fmz2aBbOsjusCsoJPCq19F3r9/D8m9KLog+wmLgTpS0/8d1C112df0pn8D VmYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822539; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fqlnazOt7kJ3822TP+aL3SiF//u+HTRXad+baVnErHY=; b=PTi/khpkbT5ftwnbkTRCemBQfJ1QETMY8fksWmgu0Ea1wA3/CO76z9M0RzaeJ+oecj V+BrXn6aDpvF9+RgOjxqjAfxppFjgO+SmHkZDZCMKA8ZD9juXW+N14neuGcLA/XnfArO 8dNucy6Vhp1pMw1dYfMRfDUl7t5SUU3h1c2aquNCmYsv3KveiiaYEkEIVydr9O8p7llJ 33XwS7+1T6ibQYVvtvXNBhuvjU1e9yQpET9VjsqPQc7HqCR54wKnDonPeEzitGt0DIii 0Nu4kRd2kH+BObXj+/64vyR6YE8g+HkTH41CYbTYdkDVRn6BB08evRIBpDzx1XaGmnyo CsVQ== X-Gm-Message-State: AO0yUKXteKmKXvdFMKhTEvIU9GMFO7m3BbS+wv1blKuis2MznGbd/dYx pm7/TDsHDaI3zhWSCqohpXI= X-Google-Smtp-Source: AK7set+qWJoVPRpZDJF9uabDzOXssV1aMtfo3xOWWbAi38faUnSMo8mfC7IGQwWLWU01/6M+R0ifFA== X-Received: by 2002:a05:622a:c6:b0:3b6:2c3b:8c00 with SMTP id p6-20020a05622a00c600b003b62c3b8c00mr15120345qtw.66.1679822538836; Sun, 26 Mar 2023 02:22:18 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:18 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 02/13] proc_ns: Extend the field type in struct proc_ns_operations to long Date: Sun, 26 Mar 2023 09:21:57 +0000 Message-Id: <20230326092208.13613-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In struct proc_ns_operations, the field 'type' is the new namespace clone flag. As the newly introduced CLONE_NEWBPF is more than 32bit, we need also extend this field from int to long to adapt to this change. Signed-off-by: Yafang Shao --- include/linux/proc_ns.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 75807ec..555c257 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -16,7 +16,7 @@ struct proc_ns_operations { const char *name; const char *real_ns_name; - int type; + long type; struct ns_common *(*get)(struct task_struct *task); void (*put)(struct ns_common *ns); int (*install)(struct nsset *nsset, struct ns_common *ns); From patchwork Sun Mar 26 09:21:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188012 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AD77C77B60 for ; Sun, 26 Mar 2023 09:22:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231841AbjCZJWY (ORCPT ); Sun, 26 Mar 2023 05:22:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231786AbjCZJWW (ORCPT ); Sun, 26 Mar 2023 05:22:22 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D36140EE; Sun, 26 Mar 2023 02:22:20 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id n14so5854766qta.10; Sun, 26 Mar 2023 02:22:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rhyLQ7UYumv+WDRskz2d3S0DvVb7k7Pdnk50s5ULN88=; b=LgWjCfQPW96aOHnMVq1nRspU1h2cX8WSfoqtvehGvZeGVM8Fk+1z38MitJwaD+VQ1m a1DzigR/DlstuMePJAOE+PL1hi/Wv43CjyP9atiTXEnE8D6Lw9gyB8ZkcYochl3NYyz/ 1MkBiF+Fgkt4WlOCeBnrpIzaUROEApGiYPo8aWJ4FGR5cvvRN+yJlWbgnQDP7cB1gG6w jg5TcJhjRTPJ11kKAG6su2RmC8OSoAforxKy/Sdo9aQcXhwX09QGj/D87Hu/q2QO83vE a+5/SM/apSkAfZmjC+SSZ4H5yRkd7x27BZPsoxb3F6ImVuoIov1+WFqZVi+S9jx8A88u Gp5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rhyLQ7UYumv+WDRskz2d3S0DvVb7k7Pdnk50s5ULN88=; b=1hLrEjLN3jB3EHZsKqi25UHW7KbGUp0HI39bZqDAM3ded7hwlEwzAaNusbF7RUf/NS gewDRuRtu6DixI6dHgOWXSrWfLrL4N3+bp56lQOdWI3FyGSBgMrvWDxp605OSye9KukE 89rhR0xEyftr1VKv94EIY9apBzy6eHdKHT92VgACOwnWvUmC+LqrV2gMSuWOTR56yy2g pyzkR3AWhhpmFdK4+z+c2OvKfSXjQx7sA+XZH4xPosQ1IRNHvbub5QfW9wip5130oRp+ YN8epdOtlihoq33jHVo1lL8yGLP0EVJ71ezopJK2BDWKLTou2yG0d/jXlOnOvPhgb4hn r42Q== X-Gm-Message-State: AO0yUKXvnA5+aYAyPWTP42UL7Gyi+YJ1EmFpSQW13SUxwlAd5b4qS0wS bXrNmGM5PqSSDWHAi+q3fEg= X-Google-Smtp-Source: AK7set/MLQGWLPcaWrkNF1oMgaLnh9FItragX1D+PrIw2lcRq9FVElYDRq+Y+MS72LnghzlsCCsRJg== X-Received: by 2002:ac8:5fcd:0:b0:3de:94da:4fd7 with SMTP id k13-20020ac85fcd000000b003de94da4fd7mr16056824qta.39.1679822539726; Sun, 26 Mar 2023 02:22:19 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:19 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 03/13] bpf: Implement bpf namespace Date: Sun, 26 Mar 2023 09:21:58 +0000 Message-Id: <20230326092208.13613-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC It is similar with pid namespace. When we create a new bpf object in a child BPF namespace, it will alloc the id in current BPF namespace and its parent BPF namespace. The hierarchy as follows, init_bpf_ns : level = 0 / \ child_a child_b : level = 1 / \ child_b_a child_b_b : level = 2 When we create a bpf object in child_bb, it will allocate IDs for this object in child_bb, child_b and the init_bpf_ns. We will allocate the id for bpf_map, bpf_prog and bpf_link in bpf namespace. Signed-off-by: Yafang Shao --- fs/proc/namespaces.c | 4 + include/linux/bpf_namespace.h | 46 +++++++++ include/linux/nsproxy.h | 4 + include/linux/proc_ns.h | 1 + include/linux/user_namespace.h | 1 + kernel/bpf/Makefile | 1 + kernel/bpf/bpf_namespace.c | 219 +++++++++++++++++++++++++++++++++++++++++ kernel/nsproxy.c | 19 +++- kernel/ucount.c | 1 + 9 files changed, 294 insertions(+), 2 deletions(-) create mode 100644 include/linux/bpf_namespace.h create mode 100644 kernel/bpf/bpf_namespace.c diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c index 8e159fc..1a36757 100644 --- a/fs/proc/namespaces.c +++ b/fs/proc/namespaces.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "internal.h" @@ -37,6 +38,9 @@ &timens_operations, &timens_for_children_operations, #endif +#ifdef CONFIG_BPF + &bpfns_operations, +#endif }; static const char *proc_ns_get_link(struct dentry *dentry, diff --git a/include/linux/bpf_namespace.h b/include/linux/bpf_namespace.h new file mode 100644 index 0000000..06aa51f --- /dev/null +++ b/include/linux/bpf_namespace.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_BPF_ID_NS_H +#define _LINUX_BPF_ID_NS_H +#include +#include +#include +#include + +struct ubpf_obj_id { + int nr; + struct bpf_namespace *ns; +}; + +struct bpf_obj_id { + refcount_t count; + unsigned int level; + struct rcu_head rcu; + struct ubpf_obj_id numbers[1]; +}; + +enum { + MAP_OBJ_ID = 0, + PROG_OBJ_ID, + LINK_OBJ_ID, + OBJ_ID_NUM, +}; + +struct bpf_namespace { + struct idr idr[OBJ_ID_NUM]; + struct rcu_head rcu; + int level; + struct ns_common ns; + struct user_namespace *user_ns; + struct kmem_cache *obj_id_cachep; + struct bpf_namespace *parent; + struct ucounts *ucounts; +}; + +extern struct bpf_namespace init_bpf_ns; +extern struct proc_ns_operations bpfns_operations; + +struct bpf_namespace *copy_bpfns(unsigned long flags, + struct user_namespace *user_ns, + struct bpf_namespace *old_ns); +void put_bpfns(struct bpf_namespace *ns); +#endif /* _LINUX_BPF_ID_NS_H */ diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index fee881c..d24ab6b 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -10,6 +10,9 @@ struct ipc_namespace; struct pid_namespace; struct cgroup_namespace; +#ifdef CONFIG_BPF +struct bpf_namespace; +#endif struct fs_struct; /* @@ -38,6 +41,7 @@ struct nsproxy { struct time_namespace *time_ns; struct time_namespace *time_ns_for_children; struct cgroup_namespace *cgroup_ns; + struct bpf_namespace *bpf_ns; }; extern struct nsproxy init_nsproxy; diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 555c257..c10ce2c 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -46,6 +46,7 @@ enum { PROC_PID_INIT_INO = 0xEFFFFFFCU, PROC_CGROUP_INIT_INO = 0xEFFFFFFBU, PROC_TIME_INIT_INO = 0xEFFFFFFAU, + PROC_BPF_INIT_INO = 0xEFFFFFF9U, }; #ifdef CONFIG_PROC_FS diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 45f09be..93eb618 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -54,6 +54,7 @@ enum ucount_type { UCOUNT_FANOTIFY_GROUPS, UCOUNT_FANOTIFY_MARKS, #endif + UCOUNT_BPF_NAMESPACES, UCOUNT_COUNTS, }; diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 0224261..828aef0 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -44,3 +44,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE $(call if_changed_rule,cc_o_c) +obj-$(CONFIG_BPF_SYSCALL) += bpf_namespace.o diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c new file mode 100644 index 0000000..88a86cd --- /dev/null +++ b/kernel/bpf/bpf_namespace.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAX_BPF_NS_LEVEL 32 +static struct kmem_cache *bpfns_cachep; +static struct kmem_cache *obj_id_cache[MAX_PID_NS_LEVEL]; +static struct ns_common *bpfns_get(struct task_struct *task); +static void bpfns_put(struct ns_common *ns); +static struct kmem_cache *create_bpf_cachep(unsigned int level); +static DEFINE_MUTEX(obj_id_caches_mutex); + +static int bpfns_install(struct nsset *nsset, struct ns_common *ns) +{ + pr_info("setns not supported for bpf namespace"); + return -EOPNOTSUPP; +} + +struct proc_ns_operations bpfns_operations = { + .name = "bpf", + .type = CLONE_NEWBPF, + .get = bpfns_get, + .put = bpfns_put, + .install = bpfns_install, +}; + +struct bpf_namespace init_bpf_ns = { + .level = 0, + .user_ns = &init_user_ns, + .ns.ops = &bpfns_operations, + .ns.inum = PROC_BPF_INIT_INO, +}; + +static struct bpf_namespace *get_bpfns(struct bpf_namespace *ns) +{ + if (ns != &init_bpf_ns) + refcount_inc(&ns->ns.count); + return ns; +} + +static struct ns_common *bpfns_get(struct task_struct *task) +{ + struct ns_common *ns = NULL; + struct nsproxy *nsproxy; + + rcu_read_lock(); + nsproxy = task->nsproxy; + if (nsproxy) { + ns = &nsproxy->bpf_ns->ns; + get_bpfns(container_of(ns, struct bpf_namespace, ns)); + } + rcu_read_unlock(); + return ns; +} + +static struct ucounts *inc_bpf_namespaces(struct user_namespace *ns) +{ + return inc_ucount(ns, current_euid(), UCOUNT_BPF_NAMESPACES); +} + +static void dec_bpf_namespaces(struct ucounts *ucounts) +{ + dec_ucount(ucounts, UCOUNT_BPF_NAMESPACES); +} + +static void delayed_free_bpfns(struct rcu_head *p) +{ + struct bpf_namespace *ns = container_of(p, struct bpf_namespace, rcu); + + dec_bpf_namespaces(ns->ucounts); + put_user_ns(ns->user_ns); + kmem_cache_free(bpfns_cachep, ns); +} + +static void destroy_bpf_namespace(struct bpf_namespace *ns) +{ + int i; + + ns_free_inum(&ns->ns); + for (i = 0; i < OBJ_ID_NUM; i++) + idr_destroy(&ns->idr[i]); + call_rcu(&ns->rcu, delayed_free_bpfns); +} + +void put_bpfns(struct bpf_namespace *ns) +{ + struct bpf_namespace *parent; + + while (ns != &init_bpf_ns) { + parent = ns->parent; + if (!refcount_dec_and_test(&ns->ns.count)) + break; + destroy_bpf_namespace(ns); + ns = parent; + } +} + +static void bpfns_put(struct ns_common *ns) +{ + struct bpf_namespace *bpf_ns; + + bpf_ns = container_of(ns, struct bpf_namespace, ns); + put_bpfns(bpf_ns); +} + +static struct bpf_namespace * +create_bpf_namespace(struct user_namespace *user_ns, + struct bpf_namespace *parent_bpfns) +{ + struct bpf_namespace *ns; + unsigned int level = parent_bpfns->level + 1; + struct ucounts *ucounts; + int err; + int i; + + err = -EINVAL; + if (!in_userns(parent_bpfns->user_ns, user_ns)) + goto out; + + err = -ENOSPC; + if (level > MAX_BPF_NS_LEVEL) + goto out; + ucounts = inc_bpf_namespaces(user_ns); + if (!ucounts) + goto out; + + err = -ENOMEM; + ns = kmem_cache_zalloc(bpfns_cachep, GFP_KERNEL); + if (!ns) + goto out_dec; + + for (i = 0; i < OBJ_ID_NUM; i++) + idr_init(&ns->idr[i]); + + ns->obj_id_cachep = create_bpf_cachep(level); + if (!ns->obj_id_cachep) + goto out_free_idr; + + err = ns_alloc_inum(&ns->ns); + if (err) + goto out_free_idr; + ns->ns.ops = &bpfns_operations; + + refcount_set(&ns->ns.count, 1); + ns->level = level; + ns->parent = get_bpfns(parent_bpfns); + ns->user_ns = get_user_ns(user_ns); + ns->ucounts = ucounts; + return ns; + +out_free_idr: + for (i = 0; i < OBJ_ID_NUM; i++) + idr_destroy(&ns->idr[i]); + kmem_cache_free(bpfns_cachep, ns); +out_dec: + dec_bpf_namespaces(ucounts); +out: + return ERR_PTR(err); +} + +struct bpf_namespace *copy_bpfns(unsigned long flags, + struct user_namespace *user_ns, + struct bpf_namespace *old_ns) +{ + if (!(flags & CLONE_NEWBPF)) + return get_bpfns(old_ns); + return create_bpf_namespace(user_ns, old_ns); +} + +static struct kmem_cache *create_bpf_cachep(unsigned int level) +{ + /* Level 0 is init_bpf_ns.obj_id_cachep */ + struct kmem_cache **pkc = &obj_id_cache[level - 1]; + struct kmem_cache *kc; + char name[4 + 10 + 1]; + unsigned int len; + + kc = READ_ONCE(*pkc); + if (kc) + return kc; + + snprintf(name, sizeof(name), "bpf_%u", level + 1); + len = sizeof(struct bpf_obj_id) + level * sizeof(struct ubpf_obj_id); + mutex_lock(&obj_id_caches_mutex); + /* Name collision forces to do allocation under mutex. */ + if (!*pkc) + *pkc = kmem_cache_create(name, len, 0, + SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL); + mutex_unlock(&obj_id_caches_mutex); + /* current can fail, but someone else can succeed. */ + return READ_ONCE(*pkc); +} + +static void __init bpfns_idr_init(void) +{ + int i; + + init_bpf_ns.obj_id_cachep = + KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); + for (i = 0; i < OBJ_ID_NUM; i++) + idr_init(&init_bpf_ns.idr[i]); +} + +static __init int bpf_namespaces_init(void) +{ + bpfns_cachep = KMEM_CACHE(bpf_namespace, SLAB_PANIC | SLAB_ACCOUNT); + bpfns_idr_init(); + return 0; +} + +late_initcall(bpf_namespaces_init); diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index a487ff2..6a6fa70 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -26,6 +27,7 @@ #include #include #include +#include static struct kmem_cache *nsproxy_cachep; @@ -47,6 +49,9 @@ struct nsproxy init_nsproxy = { .time_ns = &init_time_ns, .time_ns_for_children = &init_time_ns, #endif +#ifdef CONFIG_BPF + .bpf_ns = &init_bpf_ns, +#endif }; static inline struct nsproxy *create_nsproxy(void) @@ -121,8 +126,16 @@ static struct nsproxy *create_new_namespaces(unsigned long flags, } new_nsp->time_ns = get_time_ns(tsk->nsproxy->time_ns); + new_nsp->bpf_ns = copy_bpfns(flags, user_ns, tsk->nsproxy->bpf_ns); + if (IS_ERR(new_nsp->bpf_ns)) { + err = PTR_ERR(new_nsp->bpf_ns); + goto out_bpf; + } return new_nsp; +out_bpf: + put_time_ns(new_nsp->time_ns); + put_time_ns(new_nsp->time_ns_for_children); out_time: put_net(new_nsp->net_ns); out_net: @@ -156,7 +169,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET | - CLONE_NEWCGROUP | CLONE_NEWTIME)))) { + CLONE_NEWCGROUP | CLONE_NEWTIME | CLONE_NEWBPF)))) { if ((flags & CLONE_VM) || likely(old_ns->time_ns_for_children == old_ns->time_ns)) { get_nsproxy(old_ns); @@ -203,6 +216,8 @@ void free_nsproxy(struct nsproxy *ns) put_time_ns(ns->time_ns_for_children); put_cgroup_ns(ns->cgroup_ns); put_net(ns->net_ns); + if (ns->bpf_ns) + put_bpfns(ns->bpf_ns); kmem_cache_free(nsproxy_cachep, ns); } @@ -218,7 +233,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags, if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP | - CLONE_NEWTIME))) + CLONE_NEWTIME | CLONE_NEWBPF))) return 0; user_ns = new_cred ? new_cred->user_ns : current_user_ns(); diff --git a/kernel/ucount.c b/kernel/ucount.c index ee8e57f..97e0ae3 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -87,6 +87,7 @@ static int set_permissions(struct ctl_table_header *head, UCOUNT_ENTRY("max_fanotify_groups"), UCOUNT_ENTRY("max_fanotify_marks"), #endif + UCOUNT_ENTRY("max_bpf_namespaces"), { } }; #endif /* CONFIG_SYSCTL */ From patchwork Sun Mar 26 09:21:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188013 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30207C6FD1C for ; Sun, 26 Mar 2023 09:22:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231896AbjCZJWZ (ORCPT ); Sun, 26 Mar 2023 05:22:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231817AbjCZJWW (ORCPT ); Sun, 26 Mar 2023 05:22:22 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7793B5BB1; Sun, 26 Mar 2023 02:22:21 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id a5so5875433qto.6; Sun, 26 Mar 2023 02:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=I1Zf6mBo0BAcs/anjiNOAvLM/6WnwvWROkn3ev6fwzU=; b=Xny1OUglEPhpvieEbLz2cvRjjAo5MlD0Obh5seAAppIv9GfutRhWTPb3tufsXZ53HN Yx7a1Uc79EAcYyoN4mTwOOzN5+508MyW09h2B5uau8cRtgZvtFz6pBpdIKBgUjjRPQo8 9ylDg4S6pkddVF/NEvDaFoPvAFiDK29IqcMvXyMmWp0xTMQoeUKLayA/YbQ/juEry+Rl O5MRw7j66yPLjPeHE2IbpwWAl9HiPQbjd5ZFTBH9aVR2d6i3TouSRE+peWBap7JP2GYX RpgoHp2Q6M2cYXQgnekplpEh/isWECLBxnXZUhXeJkfYtWDZ3XqSG5Xulh9KpfZi3JUz 9XjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I1Zf6mBo0BAcs/anjiNOAvLM/6WnwvWROkn3ev6fwzU=; b=ppFgc9aXn54xsrG/aN8/ZQ8QWSyg8spoIHiCrz4XeHCgDR3Rvj7UQEUyE9TL2rU40Q IH9cF6o7aVBA7d1uBdH7BhzquaMzjjjT9E4EU+nbiHZtF9USyOBbSE4F7/KOgCXvSAvi V1yukz/xEzxaNhur2qKw4IdmWxC/t4SVy+BOf3MCF/gq6FSB6SBnn2c89KNS7TBKCBB/ ecGA3DtOuG8WMsJQG349Iq+KYj0S15yp6hzn0bkcd44rxwkMN3MgSkhQR1HaOre90lI5 9azKfrhhrEyRoFFQ+LATtKeon1cRmE0Bq8o+/Tvu4u6Y32KVIKbCLl1WvG3Zz8Y6FSYH e2jQ== X-Gm-Message-State: AO0yUKVyXjH1iKMfrYlk1WVyJSH5hdIlfQ6PErL/S2GY8VPwhJPf+unh Px26eupC66UZtagFIHWOlmU= X-Google-Smtp-Source: AK7set/Mdsbpbyyu1gXXbwRaNg6VaZ5oBPUNVmPZTuM6CJhHyddJuTChzPQfKcLO6345fEsLP2qouA== X-Received: by 2002:ac8:5c16:0:b0:3df:50ef:fafa with SMTP id i22-20020ac85c16000000b003df50effafamr14597478qti.4.1679822540545; Sun, 26 Mar 2023 02:22:20 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:20 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 04/13] bpf: No need to check if id is 0 Date: Sun, 26 Mar 2023 09:21:59 +0000 Message-Id: <20230326092208.13613-5-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC idr_alloc_cyclic() will return -ENOSPC if there's no available IDs, so don't need to check if the id is less than the start number. Signed-off-by: Yafang Shao --- kernel/bpf/syscall.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e18ac7f..f3664f2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -387,9 +387,6 @@ static int bpf_map_alloc_id(struct bpf_map *map) spin_unlock_bh(&map_idr_lock); idr_preload_end(); - if (WARN_ON_ONCE(!id)) - return -ENOSPC; - return id > 0 ? 0 : id; } @@ -2032,10 +2029,6 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog) spin_unlock_bh(&prog_idr_lock); idr_preload_end(); - /* id is in [1, INT_MAX) */ - if (WARN_ON_ONCE(!id)) - return -ENOSPC; - return id > 0 ? 0 : id; } From patchwork Sun Mar 26 09:22:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188014 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08AB1C77B6D for ; Sun, 26 Mar 2023 09:22:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbjCZJW1 (ORCPT ); Sun, 26 Mar 2023 05:22:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231834AbjCZJWX (ORCPT ); Sun, 26 Mar 2023 05:22:23 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D01F9032; Sun, 26 Mar 2023 02:22:22 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id w25so5876524qtc.5; Sun, 26 Mar 2023 02:22:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=01YykE5VdF3mbRIggVDZ1yArVGph94w2fkqHep9wJ/M=; b=HDH3pJEizmyJMBwcqdhwW7RySQMEhGJzrjzbgOB8zDeEHz1yjmTofopEYWHefgJ2OR QcSB6O+IfmUfQnQA73rLVs/YvdRkyc7zTqzBVezuQ8j4geqUA90RDJyorvJf3uvQD+Yb MPlcrWgJZyg9zjGgajNSlgm4T60dvHQjXby77Uec5AMaNALib9jaXP0nphLXzyZvU1qF PpeWn9bGdsUysDNshGWuFXBOmAYrANq0okxkckRb2VIgyXBwmTw8DSjWm4dADyWWFw2M md4V5XO48fMRoCHOpbc4zJk5tZpRynM6BctAdB/PEwB7+pWIHum1Jy6yFiEPDtgKBPSt 9QjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=01YykE5VdF3mbRIggVDZ1yArVGph94w2fkqHep9wJ/M=; b=h5iD1eYYrXJPDKvyLYTPx+MUhdoBPWolmDeT8q0I0PokOHpiSV/JCtxD0C4HPmE6xx nTVJABrOX+h5/3R7ycYuSQWPGbH9mw5yXc8qYoZ+6FQu5606yRZ1M+5OiUom+/OrNtdN bPJ7Z1tdUMzMsC1dwLS+j+pwbGCdY+NVQNmQjOTYz5Y7yu+6gJ3d+vYSQLST8PgNFSuY EnOUbSBNrWMcxuAJTEBY2KtKGzx33wdydgcrCNEbF6esNBdApqPPYnbFHCdFqA7w1ct2 8LbFj8ONJOp8e52HFJ4WV/W/Cdm7odbJC16M0K7vtMrkLugo0z+IUHxd3McceMG2imDx pN8g== X-Gm-Message-State: AAQBX9d38bpQ7nSpYlLhUvGdDGY/uuenr4YlmjSCC4Ta2rN/dJpgfltD 6XnYv6QI8u91dXSwfa1JG64= X-Google-Smtp-Source: AKy350ZvRbDR2S6vDYQtlTB5zvkVrZFBwi/+I2BXm08Hu3WT5IMgowREeafYFAGFl+ZfxP5Qdri1XA== X-Received: by 2002:ac8:7fd4:0:b0:3e4:e61a:669c with SMTP id b20-20020ac87fd4000000b003e4e61a669cmr1164203qtk.8.1679822541457; Sun, 26 Mar 2023 02:22:21 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:21 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 05/13] bpf: Make bpf objects id have the same alloc and free pattern Date: Sun, 26 Mar 2023 09:22:00 +0000 Message-Id: <20230326092208.13613-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Make them have the same patthern, then we can use the generic helpers instead. Signed-off-by: Yafang Shao --- kernel/bpf/offload.c | 15 +++++++++++-- kernel/bpf/syscall.c | 62 ++++++++++++++++++++++++---------------------------- 2 files changed, 41 insertions(+), 36 deletions(-) diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index d9c9f45..aec70e0 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -134,9 +134,20 @@ static int bpf_map_offload_ndo(struct bpf_offloaded_map *offmap, static void __bpf_map_offload_destroy(struct bpf_offloaded_map *offmap) { + struct bpf_map *map = &offmap->map; + WARN_ON(bpf_map_offload_ndo(offmap, BPF_OFFLOAD_MAP_FREE)); - /* Make sure BPF_MAP_GET_NEXT_ID can't find this dead map */ - bpf_map_free_id(&offmap->map); + /* Make sure BPF_MAP_GET_NEXT_ID can't find this dead map. + * + * Offloaded maps are removed from the IDR store when their device + * disappears - even if someone holds an fd to them they are unusable, + * the memory is gone, all ops will fail; they are simply waiting for + * refcnt to drop to be freed. + */ + if (map->id) { + bpf_map_free_id(map); + map->id = 0; + } list_del_init(&offmap->offloads); offmap->netdev = NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f3664f2..ee1297d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -382,30 +382,19 @@ static int bpf_map_alloc_id(struct bpf_map *map) idr_preload(GFP_KERNEL); spin_lock_bh(&map_idr_lock); id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC); - if (id > 0) - map->id = id; spin_unlock_bh(&map_idr_lock); idr_preload_end(); - return id > 0 ? 0 : id; + return id; } void bpf_map_free_id(struct bpf_map *map) { unsigned long flags; - /* Offloaded maps are removed from the IDR store when their device - * disappears - even if someone holds an fd to them they are unusable, - * the memory is gone, all ops will fail; they are simply waiting for - * refcnt to drop to be freed. - */ - if (!map->id) - return; - spin_lock_irqsave(&map_idr_lock, flags); idr_remove(&map_idr, map->id); - map->id = 0; spin_unlock_irqrestore(&map_idr_lock, flags); } @@ -748,8 +737,11 @@ static void bpf_map_put_uref(struct bpf_map *map) void bpf_map_put(struct bpf_map *map) { if (atomic64_dec_and_test(&map->refcnt)) { - /* bpf_map_free_id() must be called first */ - bpf_map_free_id(map); + /* bpf_map_free_id() must be called first. */ + if (map->id) { + bpf_map_free_id(map); + map->id = 0; + } btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); /* Avoid spawning kworkers, since they all might contend @@ -1215,8 +1207,9 @@ static int map_create(union bpf_attr *attr) goto free_map_field_offs; err = bpf_map_alloc_id(map); - if (err) + if (err < 0) goto free_map_sec; + map->id = err; bpf_map_save_memcg(map); @@ -2024,29 +2017,18 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog) idr_preload(GFP_KERNEL); spin_lock_bh(&prog_idr_lock); id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); - if (id > 0) - prog->aux->id = id; spin_unlock_bh(&prog_idr_lock); idr_preload_end(); - return id > 0 ? 0 : id; + return id; } void bpf_prog_free_id(struct bpf_prog *prog) { unsigned long flags; - /* cBPF to eBPF migrations are currently not in the idr store. - * Offloaded programs are removed from the store when their device - * disappears - even if someone grabs an fd to them they are unusable, - * simply waiting for refcnt to drop to be freed. - */ - if (!prog->aux->id) - return; - spin_lock_irqsave(&prog_idr_lock, flags); idr_remove(&prog_idr, prog->aux->id); - prog->aux->id = 0; spin_unlock_irqrestore(&prog_idr_lock, flags); } @@ -2091,7 +2073,15 @@ static void bpf_prog_put_deferred(struct work_struct *work) prog = aux->prog; perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0); bpf_audit_prog(prog, BPF_AUDIT_UNLOAD); - bpf_prog_free_id(prog); + /* cBPF to eBPF migrations are currently not in the idr store. + * Offloaded programs are removed from the store when their device + * disappears - even if someone grabs an fd to them they are unusable, + * simply waiting for refcnt to drop to be freed. + */ + if (prog->aux->id) { + bpf_prog_free_id(prog); + prog->aux->id = 0; + } __bpf_prog_put_noref(prog, true); } @@ -2655,8 +2645,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) goto free_used_maps; err = bpf_prog_alloc_id(prog); - if (err) + if (err < 0) goto free_used_maps; + prog->aux->id = err; /* Upon success of bpf_prog_alloc_id(), the BPF prog is * effectively publicly exposed. However, retrieving via @@ -2730,9 +2721,6 @@ void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, static void bpf_link_free_id(int id) { - if (!id) - return; - spin_lock_bh(&link_idr_lock); idr_remove(&link_idr, id); spin_unlock_bh(&link_idr_lock); @@ -2748,7 +2736,10 @@ static void bpf_link_free_id(int id) void bpf_link_cleanup(struct bpf_link_primer *primer) { primer->link->prog = NULL; - bpf_link_free_id(primer->id); + if (primer->id) { + bpf_link_free_id(primer->id); + primer->id = 0; + } fput(primer->file); put_unused_fd(primer->fd); } @@ -2761,7 +2752,10 @@ void bpf_link_inc(struct bpf_link *link) /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { - bpf_link_free_id(link->id); + if (link->id) { + bpf_link_free_id(link->id); + link->id = 0; + } if (link->prog) { /* detach BPF program, clean up used resources */ link->ops->release(link); From patchwork Sun Mar 26 09:22:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188016 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB0C2C74A5B for ; Sun, 26 Mar 2023 09:22:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231927AbjCZJW2 (ORCPT ); Sun, 26 Mar 2023 05:22:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229596AbjCZJWY (ORCPT ); Sun, 26 Mar 2023 05:22:24 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 559ED40EE; Sun, 26 Mar 2023 02:22:23 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id x1so5867823qtr.7; Sun, 26 Mar 2023 02:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822542; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LqkDpvKGwevEK4sVI1Ozexg4BYNSROrbvWrzdtcWQzE=; b=gaP3awW30XRgrsax39c/QyIC65S9xvOX5NJ6A8Uvm396oWrTYMoqlb3xRoe1AJFTYi vcViKitZy7ju4PIuRlsz5fBE78ptR+jfM/hvBWvA2fwGXUwb3VdZpi1u8H4RIwuRo2u0 appsU0EZ1vBLKd2OmE7am1ADiqorhjykLhKDPYf69ho01H221OvtCczZATv0L06ozIbW 4r3B8NPEPnQkWzGG0QydnxsWfwIIleb3gYykBEjpkzZtorbvvYN53Z86ld564TWW7PEL 9NyYD8t6MIuI524FGj84dFsG6y14lcvXNc4HtgWus9cma7KuPzXiBsCr3ljXNI1lrfz9 X2TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822542; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LqkDpvKGwevEK4sVI1Ozexg4BYNSROrbvWrzdtcWQzE=; b=xTyXDqGq/l7rnxDTPPqeB/bMionKxlGIlEnjzm/O+P72kvfZQGcc+LPKJ98kZlaGNu WWmnqKNA0FzKxz9RBPfjwxwogP/m9yW41EG+AH3lyWsFcWY4PQHHr6mkkuQ6jr+uc1xq QCEbwsaAMejZYtdq5lUPfNqcogrKY5aZh7ICBWc9/BpF8EG420fWw7TXvR/otASLaM0f 29JW1hoi+JGn7/i9InzytxEvT6devylWEzmh3J+0e/t/+/TAUqGs67U0SL2W4+NJqiPm 2P8VmsKkVZ6tvYxJAXYBzVmd4DPjvC3zP1hUVT3MTs1/XFNHT52OJXKMlPUa2YuMA8sA O1jg== X-Gm-Message-State: AO0yUKXOaOUBF+u7w8vHnCA2Uvks1paiIi94e3SBnYkRzj6WvPLWgVAW HFt1VzrX9PKASZX8xkRwV0o= X-Google-Smtp-Source: AK7set+aJQlIoCdZzW8J6891dxzViF6iBvna1uPyP/Fg7uRMPMyIncvl8Q8sJJ+MU4+kZ09DK7sBPQ== X-Received: by 2002:a05:622a:614:b0:3e4:37ac:8203 with SMTP id z20-20020a05622a061400b003e437ac8203mr14417134qta.6.1679822542369; Sun, 26 Mar 2023 02:22:22 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:21 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 06/13] bpf: Helpers to alloc and free object id in bpf namespace Date: Sun, 26 Mar 2023 09:22:01 +0000 Message-Id: <20230326092208.13613-7-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Introduce generic helpers to alloc bpf_{map,prog,link} in bpf namespace. Signed-off-by: Yafang Shao --- include/linux/bpf_namespace.h | 36 ++++++++++++++++++ kernel/bpf/bpf_namespace.c | 86 +++++++++++++++++++++++++++++++++++++++++++ kernel/bpf/syscall.c | 6 +-- 3 files changed, 125 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf_namespace.h b/include/linux/bpf_namespace.h index 06aa51f..50bd68c 100644 --- a/include/linux/bpf_namespace.h +++ b/include/linux/bpf_namespace.h @@ -38,9 +38,45 @@ struct bpf_namespace { extern struct bpf_namespace init_bpf_ns; extern struct proc_ns_operations bpfns_operations; +extern spinlock_t map_idr_lock; +extern spinlock_t prog_idr_lock; +extern spinlock_t link_idr_lock; struct bpf_namespace *copy_bpfns(unsigned long flags, struct user_namespace *user_ns, struct bpf_namespace *old_ns); void put_bpfns(struct bpf_namespace *ns); +struct bpf_obj_id *bpf_alloc_obj_id(struct bpf_namespace *ns, + void *obj, int type); +void bpf_free_obj_id(struct bpf_obj_id *obj_id, int type); + +/* + * The helpers to get the bpf_id's id seen from different namespaces + * + * bpf_id_nr() : global id, i.e. the id seen from the init namespace; + * bpf_id_vnr() : virtual id, i.e. the id seen from the pid namespace of + * current. + * bpf_id_nr_ns() : id seen from the ns specified. + * + * see also task_xid_nr() etc in include/linux/sched.h + */ +static inline int bpf_obj_id_nr(struct bpf_obj_id *obj_id) +{ + if (obj_id) + return obj_id->numbers[0].nr; + return 0; +} + +static inline int bpf_obj_id_nr_ns(struct bpf_obj_id *obj_id, + struct bpf_namespace *ns) +{ + if (obj_id && ns->level <= obj_id->level) + return obj_id->numbers[ns->level].nr; + return 0; +} + +static inline int bpf_obj_id_vnr(struct bpf_obj_id *obj_id) +{ + return bpf_obj_id_nr_ns(obj_id, current->nsproxy->bpf_ns); +} #endif /* _LINUX_BPF_ID_NS_H */ diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c index 88a86cd..1e98d1d 100644 --- a/kernel/bpf/bpf_namespace.c +++ b/kernel/bpf/bpf_namespace.c @@ -217,3 +217,89 @@ static __init int bpf_namespaces_init(void) } late_initcall(bpf_namespaces_init); + +struct bpf_obj_id *bpf_alloc_obj_id(struct bpf_namespace *ns, + void *obj, int type) +{ + struct bpf_namespace *tmp = ns; + struct bpf_obj_id *obj_id; + spinlock_t *idr_lock; + unsigned long flags; + int id; + int i; + + switch (type) { + case MAP_OBJ_ID: + idr_lock = &map_idr_lock; + break; + case PROG_OBJ_ID: + idr_lock = &prog_idr_lock; + break; + case LINK_OBJ_ID: + idr_lock = &link_idr_lock; + break; + default: + return ERR_PTR(-EINVAL); + } + + obj_id = kmem_cache_alloc(ns->obj_id_cachep, GFP_KERNEL); + if (!obj_id) + return ERR_PTR(-ENOMEM); + + obj_id->level = ns->level; + for (i = ns->level; i >= 0; i--) { + idr_preload(GFP_KERNEL); + spin_lock_bh(idr_lock); + id = idr_alloc_cyclic(&tmp->idr[type], obj, 1, INT_MAX, GFP_ATOMIC); + spin_unlock_bh(idr_lock); + idr_preload_end(); + if (id < 0) + goto out_free; + obj_id->numbers[i].nr = id; + obj_id->numbers[i].ns = tmp; + tmp = tmp->parent; + } + + return obj_id; + +out_free: + for (; i <= ns->level; i++) { + tmp = obj_id->numbers[i].ns; + spin_lock_irqsave(idr_lock, flags); + idr_remove(&tmp->idr[type], obj_id->numbers[i].nr); + spin_unlock_irqrestore(idr_lock, flags); + } + kmem_cache_free(ns->obj_id_cachep, obj_id); + return ERR_PTR(id); +} + +void bpf_free_obj_id(struct bpf_obj_id *obj_id, int type) +{ + struct bpf_namespace *ns; + spinlock_t *idr_lock; + unsigned long flags; + int i; + + switch (type) { + case MAP_OBJ_ID: + idr_lock = &map_idr_lock; + break; + case PROG_OBJ_ID: + idr_lock = &prog_idr_lock; + break; + case LINK_OBJ_ID: + idr_lock = &link_idr_lock; + break; + default: + return; + } + /* Note that the level-0 should be freed at last */ + for (i = obj_id->level; i >= 0; i--) { + spin_lock_irqsave(idr_lock, flags); + ns = obj_id->numbers[i].ns; + idr_remove(&ns->idr[type], obj_id->numbers[i].nr); + spin_unlock_irqrestore(idr_lock, flags); + } + ns = obj_id->numbers[obj_id->level].ns; + kmem_cache_free(ns->obj_id_cachep, obj_id); +} diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ee1297d..f24e550 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,11 +48,11 @@ DEFINE_PER_CPU(int, bpf_prog_active); static DEFINE_IDR(prog_idr); -static DEFINE_SPINLOCK(prog_idr_lock); +DEFINE_SPINLOCK(prog_idr_lock); static DEFINE_IDR(map_idr); -static DEFINE_SPINLOCK(map_idr_lock); +DEFINE_SPINLOCK(map_idr_lock); static DEFINE_IDR(link_idr); -static DEFINE_SPINLOCK(link_idr_lock); +DEFINE_SPINLOCK(link_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly = IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0; From patchwork Sun Mar 26 09:22:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188015 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDC75C6FD1C for ; Sun, 26 Mar 2023 09:22:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231934AbjCZJW3 (ORCPT ); Sun, 26 Mar 2023 05:22:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231881AbjCZJWZ (ORCPT ); Sun, 26 Mar 2023 05:22:25 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22E9C9740; Sun, 26 Mar 2023 02:22:24 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id hf2so5878003qtb.3; Sun, 26 Mar 2023 02:22:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822543; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IttOk0+8XCKRMkImj0Us7L6S+8JkbptYujvUdCdqNsM=; b=oFXEPzd8QweT0zM+KjuIeU9h8UCXV3/G5mEHY2UeKsFbQI9jhWfABcjlcjJP79D5R0 BrYviuHlEmow2FWQv2UI09K958ibw7WP/cT4i6cWTSok4yz3gG1PujG66GMT34EmoNk1 4Pl/y4hbHOjkwNA0DVjbbVSKRYmQJq0dPvp9cYNWIwxy3bHSSrewTn3muC09lf5pdPeQ of3RgXmUdfkXleoPA5a8XtV5VdG7VAiup0dGZqSRezLf4v4dNARza+2ciiMOQ10MMy0V lRfEJq1z5+5g5LxwqZhHWQbgSduSqp6OzEAr7qcST9lI3JyqsiBM2DGVGU3Q0jv/gA/Q AGWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822543; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IttOk0+8XCKRMkImj0Us7L6S+8JkbptYujvUdCdqNsM=; b=hoOrgcng4Y/JihUyBcgGbQAOZVl5PeOlf0gjYqS84eUeKsb+m39w7oTLCNkEU9NNbE SwM3vmzc+H2XU22XUZefe1Rnrr3lN5SF7LLaJi/cqOtiRl7f0DW559LYs9PorRv+Gro6 O5g7r09MPVDG/L7eELJVxnBOu7xLMFDvh28TUab6ag6n24SU+mJjM9r5GOtzQImaGfPX fiQJSE2lxcIQ9BaJXRQREGRiKfL9BNT9pHHhK4lHEefDUzc2xFY986dlVaAkQDGzM7fg kpImEu+QjBDoW9ClnRhW7qXe4sTOqulaHKN0VcxB15EZtx3gf7kBYdnxnnyiWWneDfUr fv+g== X-Gm-Message-State: AO0yUKWRbEDdW0EbVDgM01JO95VGI9L5Vxipj4x6BsN8kx9XnOOD6NeN XUY1SHY8EYh3uvhbkSn/lbI4718pgwlg6eWnp5k= X-Google-Smtp-Source: AKy350aa/Cf7JCR/SBYjM6YZ/WRcM384g/nWCSMOKsCI60gnucp+fweeRUq9R95F0hO6UD7nmmFzmg== X-Received: by 2002:a05:622a:214:b0:3e3:902a:a084 with SMTP id b20-20020a05622a021400b003e3902aa084mr14239455qtx.6.1679822543297; Sun, 26 Mar 2023 02:22:23 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:22 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 07/13] bpf: Add bpf helper to get bpf object id Date: Sun, 26 Mar 2023 09:22:02 +0000 Message-Id: <20230326092208.13613-8-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC A new bpf helper is introduced to get bpf object id in a tracing bpf prog. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 7 +++++++ kernel/bpf/task_iter.c | 12 ++++++++++++ kernel/trace/bpf_trace.c | 2 ++ tools/include/uapi/linux/bpf.h | 7 +++++++ 5 files changed, 29 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2d8f3f6..c94034a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2867,6 +2867,7 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; extern const struct bpf_func_proto bpf_cgrp_storage_get_proto; extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto; +extern const struct bpf_func_proto bpf_find_obj_id_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e3d3b51..3009877 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5540,6 +5540,12 @@ struct bpf_stack_build_id { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * int bpf_find_obj_id(void *obj_id) + * Description + * Get bpf object id in current bpf namespace. + * Return + * bpf object id is returned on success. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5754,6 +5760,7 @@ struct bpf_stack_build_id { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(find_obj_id, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index c4ab9d6..a551743 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "mmap_unlock_work.h" static const char * const iter_task_type_names[] = { @@ -823,6 +824,17 @@ static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struc .arg5_type = ARG_ANYTHING, }; +BPF_CALL_1(bpf_find_obj_id, void *, obj_id) +{ + return bpf_obj_id_vnr(obj_id); +} + +const struct bpf_func_proto bpf_find_obj_id_proto = { + .func = bpf_find_obj_id, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, +}; + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index bcf91bc..977bb61 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1510,6 +1510,8 @@ static int __init bpf_key_sig_kfuncs_init(void) return &bpf_find_vma_proto; case BPF_FUNC_trace_vprintk: return bpf_get_trace_vprintk_proto(); + case BPF_FUNC_find_obj_id: + return &bpf_find_obj_id_proto; default: return bpf_base_func_proto(func_id); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d6c5a02..8beacad 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5540,6 +5540,12 @@ struct bpf_stack_build_id { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * int bpf_find_obj_id(void *obj_id) + * Description + * Get bpf object id in current bpf namespace. + * Return + * bpf object id is returned on success. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5754,6 +5760,7 @@ struct bpf_stack_build_id { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(find_obj_id, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't From patchwork Sun Mar 26 09:22:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188017 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38C10C74A5B for ; Sun, 26 Mar 2023 09:22:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231978AbjCZJWh (ORCPT ); Sun, 26 Mar 2023 05:22:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231875AbjCZJW1 (ORCPT ); Sun, 26 Mar 2023 05:22:27 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EC029744; Sun, 26 Mar 2023 02:22:24 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id x1so5867859qtr.7; Sun, 26 Mar 2023 02:22:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gAnrIb2gJlyVBdwgYJnkHeO4jmGKmm9oA7p2u8wyILQ=; b=HuluRPIl7ODzzyoZP/Ot+Yfvh1vDS809QCerFAqbUTJMfINtQFFxYQoMXevlC1wTLe 6pfDOgx/aIGmBCo+uZBZqCirUwnnPlzjhsw/p50ZiPNLFlZM9FAP3ur1LyPwoAOPer3T eKO/uDefVr6QW9mmTtb/RaXCIHSQGHRzJXzJBis9xemUkzJWtuTRGAg9ZreL4FUMbsa0 G0vVinwhSvfucXH5zocbSZCCWFy71JG0DsDmHw5+Yan66FF3I5WEVrpt1Xfw6jt23W8f y5J3h/buyjplq8rPRb9vn0Gcohr1gJSrZ5Ed6ojfZnziGCvcCXKnguffyROmMDMKXNdi vZOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gAnrIb2gJlyVBdwgYJnkHeO4jmGKmm9oA7p2u8wyILQ=; b=mnHq4yGbcFcUeIfR4cAmN49OWSCnaNMVXQE59WVnet7+2aksCQtQEuZoKC6Xx/Nu/I e8+36TDrXEBx47dkh66NbTjWZeOdfByeSm88dgzNnRw25uYFXcGx2x+Fc23ZIOtZYAvV qWcu/80SSMjmhHxuqcSrBbwK1hoSiGkxfuwg6vsHSR+R1LfrFhLN4edQyV/3fsw/KMIO zHMnOI0e4TSCrw2LYxdOk6J7vXzROcqzMFOHTzKcIINL+9XS7bozMFKV/eOHP6JyUKFB tOJ5QbSKz6H4KGMOQ0KIKm1gtjIi28VIoAYzBt44juy2KEQ/SbqLIXr0SB6kp0wksKgo o5MQ== X-Gm-Message-State: AO0yUKWpx8EAG1YQkM1i/7bbCWlmwtrFinoE2MI5PFVNAJmoDVkyfnpV FVh7TuHrr84xRa30zztxleY= X-Google-Smtp-Source: AK7set/N7UeljOUhwLWgA1VlTZpVEocvGAOCAVIuK5GyvbdfpXs5BiCuNEOADmm8KzerjN1aOdebGQ== X-Received: by 2002:a05:622a:15d2:b0:3e3:8502:fbcb with SMTP id d18-20020a05622a15d200b003e38502fbcbmr14140636qty.40.1679822544166; Sun, 26 Mar 2023 02:22:24 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:23 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 08/13] bpf: Alloc and free bpf_map id in bpf namespace Date: Sun, 26 Mar 2023 09:22:03 +0000 Message-Id: <20230326092208.13613-9-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC We only expose the bpf map id under current bpf namespace to user. The map->id is still the id in the init bpf namespace. The result as follows, Run bpftool in a new bpf namespace $ bpftool map show 4: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 159 frozen pids kprobe(20490) 5: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 159 pids kprobe(20490) $ bpftool prog show 91: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-20T22:19:16+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 4 btf_id 159 pids kprobe(20490) 93: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-20T22:19:16+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 4 btf_id 159 pids kprobe(20490) At the same time, run bpftool in init bpf namespace, $ bpftool map show 48: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 159 frozen pids kprobe(20490) 49: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 159 pids kprobe(20490) $ bpftool prog show 91: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-20T22:19:16+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 48 btf_id 159 pids kprobe(20490) 93: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-20T22:19:16+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 48 btf_id 159 pids kprobe(20490) In init bpf namespace, bpftool can also show other bpf maps, but the bpftool running in the new bpf namespace can't. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 3 +- kernel/bpf/bpf_namespace.c | 1 + kernel/bpf/offload.c | 3 +- kernel/bpf/syscall.c | 58 ++++++++++--------------------- tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 7 +++- 5 files changed, 30 insertions(+), 42 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c94034a..2a1f19c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -29,6 +29,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -279,6 +280,7 @@ struct bpf_map { } owner; bool bypass_spec_v1; bool frozen; /* write-once; write-protected by freeze_mutex */ + struct bpf_obj_id *obj_id; }; static inline const char *btf_field_type_name(enum btf_field_type type) @@ -1939,7 +1941,6 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type, void bpf_prog_put(struct bpf_prog *prog); void bpf_prog_free_id(struct bpf_prog *prog); -void bpf_map_free_id(struct bpf_map *map); struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset, u32 field_mask); diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c index 1e98d1d..6a6ef70 100644 --- a/kernel/bpf/bpf_namespace.c +++ b/kernel/bpf/bpf_namespace.c @@ -11,6 +11,7 @@ #include #define MAX_BPF_NS_LEVEL 32 +DEFINE_SPINLOCK(map_idr_lock); static struct kmem_cache *bpfns_cachep; static struct kmem_cache *obj_id_cache[MAX_PID_NS_LEVEL]; static struct ns_common *bpfns_get(struct task_struct *task); diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index aec70e0..7a90ebe 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -25,6 +25,7 @@ #include #include #include +#include /* Protects offdevs, members of bpf_offload_netdev and offload members * of all progs. @@ -145,7 +146,7 @@ static void __bpf_map_offload_destroy(struct bpf_offloaded_map *offmap) * refcnt to drop to be freed. */ if (map->id) { - bpf_map_free_id(map); + bpf_free_obj_id(map->obj_id, MAP_OBJ_ID); map->id = 0; } list_del_init(&offmap->offloads); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f24e550..1335200 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -35,6 +35,7 @@ #include #include #include +#include #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -49,8 +50,6 @@ DEFINE_PER_CPU(int, bpf_prog_active); static DEFINE_IDR(prog_idr); DEFINE_SPINLOCK(prog_idr_lock); -static DEFINE_IDR(map_idr); -DEFINE_SPINLOCK(map_idr_lock); static DEFINE_IDR(link_idr); DEFINE_SPINLOCK(link_idr_lock); @@ -375,30 +374,6 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr) map->map_extra = attr->map_extra; } -static int bpf_map_alloc_id(struct bpf_map *map) -{ - int id; - - idr_preload(GFP_KERNEL); - spin_lock_bh(&map_idr_lock); - id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC); - spin_unlock_bh(&map_idr_lock); - idr_preload_end(); - - return id; -} - -void bpf_map_free_id(struct bpf_map *map) -{ - unsigned long flags; - - spin_lock_irqsave(&map_idr_lock, flags); - - idr_remove(&map_idr, map->id); - - spin_unlock_irqrestore(&map_idr_lock, flags); -} - #ifdef CONFIG_MEMCG_KMEM static void bpf_map_save_memcg(struct bpf_map *map) { @@ -737,9 +712,9 @@ static void bpf_map_put_uref(struct bpf_map *map) void bpf_map_put(struct bpf_map *map) { if (atomic64_dec_and_test(&map->refcnt)) { - /* bpf_map_free_id() must be called first. */ + /* bpf_free_obj_id() must be called first. */ if (map->id) { - bpf_map_free_id(map); + bpf_free_obj_id(map->obj_id, MAP_OBJ_ID); map->id = 0; } btf_put(map->btf); @@ -817,7 +792,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) map->map_flags, (unsigned long long)map->map_extra, bpf_map_memory_usage(map), - map->id, + bpf_obj_id_vnr(map->obj_id), READ_ONCE(map->frozen)); if (type) { seq_printf(m, "owner_prog_type:\t%u\n", type); @@ -1115,6 +1090,7 @@ static int map_create(union bpf_attr *attr) { int numa_node = bpf_map_attr_numa_node(attr); struct btf_field_offs *foffs; + struct bpf_obj_id *obj_id; struct bpf_map *map; int f_flags; int err; @@ -1206,10 +1182,11 @@ static int map_create(union bpf_attr *attr) if (err) goto free_map_field_offs; - err = bpf_map_alloc_id(map); - if (err < 0) + obj_id = bpf_alloc_obj_id(current->nsproxy->bpf_ns, map, MAP_OBJ_ID); + if (IS_ERR(obj_id)) goto free_map_sec; - map->id = err; + map->obj_id = obj_id; + map->id = bpf_obj_id_nr(obj_id); bpf_map_save_memcg(map); @@ -1217,7 +1194,7 @@ static int map_create(union bpf_attr *attr) if (err < 0) { /* failed to allocate fd. * bpf_map_put_with_uref() is needed because the above - * bpf_map_alloc_id() has published the map + * bpf_alloc_obj_id() has published the map * to the userspace and the userspace may * have refcnt-ed it through BPF_MAP_GET_FD_BY_ID. */ @@ -3709,11 +3686,12 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, struct bpf_map *bpf_map_get_curr_or_next(u32 *id) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_map *map; spin_lock_bh(&map_idr_lock); again: - map = idr_get_next(&map_idr, id); + map = idr_get_next(&ns->idr[MAP_OBJ_ID], id); if (map) { map = __bpf_map_inc_not_zero(map, false); if (IS_ERR(map)) { @@ -3791,6 +3769,7 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) static int bpf_map_get_fd_by_id(const union bpf_attr *attr) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_map *map; u32 id = attr->map_id; int f_flags; @@ -3808,7 +3787,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) return f_flags; spin_lock_bh(&map_idr_lock); - map = idr_find(&map_idr, id); + map = idr_find(&ns->idr[MAP_OBJ_ID], id); if (map) map = __bpf_map_inc_not_zero(map, true); else @@ -3896,7 +3875,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog, map = bpf_map_from_imm(prog, imm, &off, &type); if (map) { insns[i].src_reg = type; - insns[i].imm = map->id; + insns[i].imm = bpf_obj_id_vnr(map->obj_id); insns[i + 1].imm = off; continue; } @@ -3978,7 +3957,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, u32 i; for (i = 0; i < ulen; i++) - if (put_user(prog->aux->used_maps[i]->id, + if (put_user(bpf_obj_id_vnr(prog->aux->used_maps[i]->obj_id), &user_map_ids[i])) { mutex_unlock(&prog->aux->used_maps_mutex); return -EFAULT; @@ -4242,7 +4221,7 @@ static int bpf_map_get_info_by_fd(struct file *file, memset(&info, 0, sizeof(info)); info.type = map->map_type; - info.id = map->id; + info.id = bpf_obj_id_vnr(map->obj_id); info.key_size = map->key_size; info.value_size = map->value_size; info.max_entries = map->max_entries; @@ -4994,6 +4973,7 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; union bpf_attr attr; bool capable; int err; @@ -5072,7 +5052,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) break; case BPF_MAP_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &map_idr, &map_idr_lock); + &ns->idr[MAP_OBJ_ID], &map_idr_lock); break; case BPF_BTF_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c index eb05ea5..a71aef7 100644 --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c @@ -24,11 +24,14 @@ enum bpf_obj_type { static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type) { + void *obj_id; + switch (type) { case BPF_OBJ_PROG: return BPF_CORE_READ((struct bpf_prog *)ent, aux, id); case BPF_OBJ_MAP: - return BPF_CORE_READ((struct bpf_map *)ent, id); + obj_id = BPF_CORE_READ((struct bpf_map *)ent, obj_id); + break; case BPF_OBJ_BTF: return BPF_CORE_READ((struct btf *)ent, id); case BPF_OBJ_LINK: @@ -36,6 +39,8 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type) default: return 0; } + + return bpf_find_obj_id(obj_id); } /* could be used only with BPF_LINK_TYPE_PERF_EVENT links */ From patchwork Sun Mar 26 09:22:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188018 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF6AC74A5B for ; Sun, 26 Mar 2023 09:22:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231977AbjCZJWo (ORCPT ); Sun, 26 Mar 2023 05:22:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231970AbjCZJWg (ORCPT ); Sun, 26 Mar 2023 05:22:36 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 512169761; Sun, 26 Mar 2023 02:22:26 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id cn12so2512279qtb.8; Sun, 26 Mar 2023 02:22:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yzrxwrWAnCBXqogVtCjOEq28yTqtM3iptGUsgJEuXWI=; b=EOer77E0LtfFk5y/lMefHgP5KaR43zAlptvyxKct+Vk51+7/uX1YJjJTUCQui/7SwY jZb/qCgxfZs5JvSKJMWt+3T1/2kFjpKtlw6tWsDDhcOhFMIs4+v3McHXz+Dm0hD3BqxP nozGYDtGnh9ZU0zVbyelEFydKVwi4atm1jyOhA+0icYhB7BDM2KZNIsuOtZuTj2u7dtN p7FpSKE7UzzpjtyNR6TrkL1m2eaoQIb3TFZQ43E2HWwmVg1HTMXx17GvnAnxwS3vsOPG QcOt+QgG+dW9jmzy7iC0FGDaLidTptgmBrRPRU3dV3x/wAUTwKrW7WxRq4HSxeBiDBaJ c5rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yzrxwrWAnCBXqogVtCjOEq28yTqtM3iptGUsgJEuXWI=; b=LY9zQYhAy9i/o6imx9biMdqaE56naPV14XtfihQ85Owe6LLrFqNZWaoGiuRXKi0Z3g QXrAY2l9xwWcQZlCYo54jSi76Pz/XmrAhQCDinLbpSRDYjcF169x4H0ozuga2z3cc1Bx AK9vyvUnackr6C9X85s2xvpUdp8AuioRyTVuep7IGqfkB0hf4ofSyRvRfR4ZU18BAMS/ W5/QqQwiNES2MznlrhEzBBUUmT2RBUklVKdyJFUlrO5I7w2oai42gUbaww5Y0vSWBTgc yWF376ZHj7CxKO9LQXMf7CfRTJJ2bp+uTbY+gJuo0T3mXdZyWGnGqtPcAnMtHVfpyoB0 c47w== X-Gm-Message-State: AAQBX9fl9iH9EuWEI+k3NnxwDgv3xY8OwvGKgJ8RW9bPTD1hM/MjPTIx xSwLCyY7NIwPIDwb3SuGu0w= X-Google-Smtp-Source: AK7set8M6zfz3X5CPC7/DoJ9uXrvklPCsodJDcqWq+WX69uwYe2dRtEx8t4rk6jrJVpxkD+8zVsWhw== X-Received: by 2002:a05:622a:199a:b0:3d7:db35:f2c1 with SMTP id u26-20020a05622a199a00b003d7db35f2c1mr14762506qtc.15.1679822545146; Sun, 26 Mar 2023 02:22:25 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:24 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 09/13] bpf: Alloc and free bpf_prog id in bpf namespace Date: Sun, 26 Mar 2023 09:22:04 +0000 Message-Id: <20230326092208.13613-10-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Similar to bpf map, We only expose the bpf map id under current bpf namespace to user. The prog->aux->id is still the id in the init bpf namespace. The id of used_maps is also the id in current bpf namespace. The result as follows, Run bpftool in current namespace, $ bpftool map show 4: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 96 frozen pids kprobe(8790) 5: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 96 pids kprobe(8790) $ bpftool prog show 7: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-21T10:20:58+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 4 btf_id 96 9: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-21T10:20:58+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 4 btf_id 96 At the same time, run bpftool in init bpf namespace. $ bpftool map show 18: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 96 frozen pids kprobe(8790) 19: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 96 pids kprobe(8790) $ bpftool prog show 29: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-21T10:20:58+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 18 btf_id 96 pids kprobe(8790) 31: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-21T10:20:58+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 18 btf_id 96 pids kprobe(8790) In init bpf namespace, bpftool can also show other bpf progs, but the bpftool running in the new bpf namespace can't. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 3 +- kernel/bpf/bpf_namespace.c | 1 + kernel/bpf/syscall.c | 56 ++++++++++--------------------- tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 3 +- 4 files changed, 22 insertions(+), 41 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2a1f19c..16f2a01 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1416,6 +1416,7 @@ struct bpf_prog_aux { struct work_struct work; struct rcu_head rcu; }; + struct bpf_obj_id *obj_id; }; struct bpf_prog { @@ -1940,8 +1941,6 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type, struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog); void bpf_prog_put(struct bpf_prog *prog); -void bpf_prog_free_id(struct bpf_prog *prog); - struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset, u32 field_mask); void btf_record_free(struct btf_record *rec); diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c index 6a6ef70..8c70945 100644 --- a/kernel/bpf/bpf_namespace.c +++ b/kernel/bpf/bpf_namespace.c @@ -12,6 +12,7 @@ #define MAX_BPF_NS_LEVEL 32 DEFINE_SPINLOCK(map_idr_lock); +DEFINE_SPINLOCK(prog_idr_lock); static struct kmem_cache *bpfns_cachep; static struct kmem_cache *obj_id_cache[MAX_PID_NS_LEVEL]; static struct ns_common *bpfns_get(struct task_struct *task); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1335200..4725924 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,8 +48,6 @@ #define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY) DEFINE_PER_CPU(int, bpf_prog_active); -static DEFINE_IDR(prog_idr); -DEFINE_SPINLOCK(prog_idr_lock); static DEFINE_IDR(link_idr); DEFINE_SPINLOCK(link_idr_lock); @@ -1983,32 +1981,10 @@ static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op) if (unlikely(!ab)) return; audit_log_format(ab, "prog-id=%u op=%s", - prog->aux->id, bpf_audit_str[op]); + bpf_obj_id_vnr(prog->aux->obj_id), bpf_audit_str[op]); audit_log_end(ab); } -static int bpf_prog_alloc_id(struct bpf_prog *prog) -{ - int id; - - idr_preload(GFP_KERNEL); - spin_lock_bh(&prog_idr_lock); - id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); - spin_unlock_bh(&prog_idr_lock); - idr_preload_end(); - - return id; -} - -void bpf_prog_free_id(struct bpf_prog *prog) -{ - unsigned long flags; - - spin_lock_irqsave(&prog_idr_lock, flags); - idr_remove(&prog_idr, prog->aux->id); - spin_unlock_irqrestore(&prog_idr_lock, flags); -} - static void __bpf_prog_put_rcu(struct rcu_head *rcu) { struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu); @@ -2056,7 +2032,7 @@ static void bpf_prog_put_deferred(struct work_struct *work) * simply waiting for refcnt to drop to be freed. */ if (prog->aux->id) { - bpf_prog_free_id(prog); + bpf_free_obj_id(prog->aux->obj_id, PROG_OBJ_ID); prog->aux->id = 0; } __bpf_prog_put_noref(prog, true); @@ -2157,7 +2133,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp) prog->jited, prog_tag, prog->pages * 1ULL << PAGE_SHIFT, - prog->aux->id, + bpf_obj_id_vnr(prog->aux->obj_id), stats.nsecs, stats.cnt, stats.misses, @@ -2468,6 +2444,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; + struct bpf_obj_id *obj_id; int err; char license[128]; bool is_gpl; @@ -2621,12 +2598,13 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) if (err < 0) goto free_used_maps; - err = bpf_prog_alloc_id(prog); - if (err < 0) + obj_id = bpf_alloc_obj_id(current->nsproxy->bpf_ns, prog, PROG_OBJ_ID); + if (IS_ERR(obj_id)) goto free_used_maps; - prog->aux->id = err; + prog->aux->obj_id = obj_id; + prog->aux->id = bpf_obj_id_nr(obj_id); - /* Upon success of bpf_prog_alloc_id(), the BPF prog is + /* Upon success of bpf_alloc_obj_id(), the BPF prog is * effectively publicly exposed. However, retrieving via * bpf_prog_get_fd_by_id() will take another reference, * therefore it cannot be gone underneath us. @@ -2803,7 +2781,7 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) "prog_tag:\t%s\n" "prog_id:\t%u\n", prog_tag, - prog->aux->id); + bpf_obj_id_vnr(prog->aux->obj_id)); } if (link->ops->show_fdinfo) link->ops->show_fdinfo(link, m); @@ -3706,11 +3684,12 @@ struct bpf_map *bpf_map_get_curr_or_next(u32 *id) struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_prog *prog; spin_lock_bh(&prog_idr_lock); again: - prog = idr_get_next(&prog_idr, id); + prog = idr_get_next(&ns->idr[PROG_OBJ_ID], id); if (prog) { prog = bpf_prog_inc_not_zero(prog); if (IS_ERR(prog)) { @@ -3727,13 +3706,14 @@ struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id) struct bpf_prog *bpf_prog_by_id(u32 id) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_prog *prog; if (!id) return ERR_PTR(-ENOENT); spin_lock_bh(&prog_idr_lock); - prog = idr_find(&prog_idr, id); + prog = idr_find(&ns->idr[PROG_OBJ_ID], id); if (prog) prog = bpf_prog_inc_not_zero(prog); else @@ -3939,7 +3919,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, return -EFAULT; info.type = prog->type; - info.id = prog->aux->id; + info.id = bpf_obj_id_vnr(prog->aux->obj_id); info.load_time = prog->aux->load_time; info.created_by_uid = from_kuid_munged(current_user_ns(), prog->aux->user->uid); @@ -4287,7 +4267,7 @@ static int bpf_link_get_info_by_fd(struct file *file, info.type = link->type; info.id = link->id; if (link->prog) - info.prog_id = link->prog->aux->id; + info.prog_id = bpf_obj_id_vnr(link->prog->aux->obj_id); if (link->ops->fill_link_info) { err = link->ops->fill_link_info(link, &info); @@ -4452,7 +4432,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, struct bpf_raw_event_map *btp = raw_tp->btp; err = bpf_task_fd_query_copy(attr, uattr, - raw_tp->link.prog->aux->id, + bpf_obj_id_vnr(raw_tp->link.prog->aux->obj_id), BPF_FD_TYPE_RAW_TRACEPOINT, btp->tp->name, 0, 0); goto put_file; @@ -5048,7 +5028,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) break; case BPF_PROG_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &prog_idr, &prog_idr_lock); + &ns->idr[PROG_OBJ_ID], &prog_idr_lock); break; case BPF_MAP_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c index a71aef7..1fd8ceb 100644 --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c @@ -28,7 +28,8 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type) switch (type) { case BPF_OBJ_PROG: - return BPF_CORE_READ((struct bpf_prog *)ent, aux, id); + obj_id = BPF_CORE_READ((struct bpf_prog *)ent, aux, obj_id); + break; case BPF_OBJ_MAP: obj_id = BPF_CORE_READ((struct bpf_map *)ent, obj_id); break; From patchwork Sun Mar 26 09:22:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188020 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5687EC6FD1C for ; Sun, 26 Mar 2023 09:23:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232118AbjCZJXF (ORCPT ); Sun, 26 Mar 2023 05:23:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232019AbjCZJWn (ORCPT ); Sun, 26 Mar 2023 05:22:43 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F63B9775; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id r5so5877097qtp.4; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822546; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ciAal/rEXFScF3qvMlCUWJgominifo4LMMWsWoEwMYs=; b=Gyf1IL9gVS0wtpFR0kRiBs7uCodqNZyxVG45yqVX+yuoIEpw2QOioq6tmmJ9YLIEgC oT52OXhprtvLBdVOOLWXTiwfQjjoX6sgH6ZbTBUPN12KpSDRYfxo2oJ+aplDSw4+E5dd 4fGI/BWn0Y4SlNEHBWThgoKvxtJAfSs+JzoK4ajxz6jEwRIuff24dUmvRaa0TJUIXbFz KfEL9Q5CvF2FxK4IOBdUYAbxfg059gBprx/IpH7/WEknO7zP0FYZMf2M78hE40M9IVzw eCk0NQh2AHUTXyuosUbxtm6eAo3nm1zgEeRw9bWmeBVCNS1HufHtHyF1Iiwu2jbeIrzo MlTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822546; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ciAal/rEXFScF3qvMlCUWJgominifo4LMMWsWoEwMYs=; b=nuioiVUL1dF4q/dev8TJgiO9PHO06YM34REYP45Lps75cPjYTD0htOauYEMdLEpRDh v3o/qxSrJBU3w0x+eDa4blr+1DJFa4xEd24XycAVp40juXgBrIx9nq6Rwd88JsSqQrCg mTEa1FcN5DMwF74SlWGT1bxVn7+ifo7wJfKiKwYv42+g1pi6pN7N1WLHYlMMmz+AbOGB ot6Whf1nu7skPLyXvXQXmmbsw9Dh91GibYXRyg9pVy+/FooK8cbAphsAJUTPKd5wXPEY IMAIDNvS4EHEbL6VsXwZ9lcMEiTCzrI2w4bTKULPIJHsZE996vr/gt3fvymHW07aoYcM V4+g== X-Gm-Message-State: AO0yUKVk8yS6mJBaM2dYbGpc4FfkRKFcJD8PCQ9CpeH6tvLNlTepi7nl 3bXuWAdNxAw6VFc1iP3Sy9g2/3znClrEi0MNX/8= X-Google-Smtp-Source: AK7set/Tu1m404G1UMWbUZYLpO7ogYjIDGBCA6cAP+5qHVmqVs/C5xcahi1CKlAOun6clKaIJLQLKw== X-Received: by 2002:a05:622a:1748:b0:3ba:2203:6c92 with SMTP id l8-20020a05622a174800b003ba22036c92mr15755300qtk.10.1679822546087; Sun, 26 Mar 2023 02:22:26 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:25 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 10/13] bpf: Alloc and free bpf_link id in bpf namespace Date: Sun, 26 Mar 2023 09:22:05 +0000 Message-Id: <20230326092208.13613-11-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Similar to bpf map, We only expose the bpf link id under current bpf namespace to user. The link->id is still the id in the init bpf namespace. The result as follows, Run bpftool in a new bpf namespace, $ bpftool map show 4: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 79 frozen pids kprobe(8322) 5: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 79 pids kprobe(8322) $ bpftool prog show 7: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-21T13:54:34+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 4 btf_id 79 pids kprobe(8322) 9: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-21T13:54:34+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 4 btf_id 79 pids kprobe(8322) $ bpftool link show 1: perf_event prog 9 bpf_cookie 0 pids kprobe(8322) 2: perf_event prog 7 bpf_cookie 0 pids kprobe(8322) At the same time, run bpftool in the init bpf namespace, $ bpftool map show 8: array name kprobe_b.rodata flags 0x80 key 4B value 37B max_entries 1 memlock 360B btf_id 79 frozen pids kprobe(8322) 9: array name kprobe_b.data flags 0x400 key 4B value 4B max_entries 1 memlock 8192B btf_id 79 pids kprobe(8322) $ bpftool prog show 15: kprobe name kretprobe_run tag 0de47cc241a2b1b3 gpl loaded_at 2023-03-21T13:54:34+0800 uid 0 xlated 56B jited 39B memlock 4096B map_ids 8 btf_id 79 pids kprobe(8322) 17: kprobe name kprobe_run tag bf163b23cd3b174d gpl loaded_at 2023-03-21T13:54:34+0800 uid 0 xlated 48B jited 35B memlock 4096B map_ids 8 btf_id 79 pids kprobe(8322) $ bpftool link show 2: perf_event prog 17 bpf_cookie 0 pids kprobe(8322) 3: perf_event prog 15 bpf_cookie 0 pids kprobe(8322) The bpftool running in the init bpf namespace can also show other bpf links, but the bpftool in the new bpf namespace can only show the links in its current bpf namespace. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 2 ++ kernel/bpf/bpf_namespace.c | 1 + kernel/bpf/syscall.c | 55 +++++++++++-------------------- tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 3 +- 4 files changed, 24 insertions(+), 37 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 16f2a01..efa14ac 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1468,6 +1468,7 @@ struct bpf_link { const struct bpf_link_ops *ops; struct bpf_prog *prog; struct work_struct work; + struct bpf_obj_id *obj_id; }; struct bpf_link_ops { @@ -1506,6 +1507,7 @@ struct bpf_link_primer { struct file *file; int fd; u32 id; + struct bpf_obj_id *obj_id; }; struct bpf_struct_ops_value; diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c index 8c70945..c7d62ef 100644 --- a/kernel/bpf/bpf_namespace.c +++ b/kernel/bpf/bpf_namespace.c @@ -13,6 +13,7 @@ #define MAX_BPF_NS_LEVEL 32 DEFINE_SPINLOCK(map_idr_lock); DEFINE_SPINLOCK(prog_idr_lock); +DEFINE_SPINLOCK(link_idr_lock); static struct kmem_cache *bpfns_cachep; static struct kmem_cache *obj_id_cache[MAX_PID_NS_LEVEL]; static struct ns_common *bpfns_get(struct task_struct *task); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4725924..855d5f7 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,8 +48,6 @@ #define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY) DEFINE_PER_CPU(int, bpf_prog_active); -static DEFINE_IDR(link_idr); -DEFINE_SPINLOCK(link_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly = IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0; @@ -2670,17 +2668,11 @@ void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, atomic64_set(&link->refcnt, 1); link->type = type; link->id = 0; + link->obj_id = NULL; link->ops = ops; link->prog = prog; } -static void bpf_link_free_id(int id) -{ - spin_lock_bh(&link_idr_lock); - idr_remove(&link_idr, id); - spin_unlock_bh(&link_idr_lock); -} - /* Clean up bpf_link and corresponding anon_inode file and FD. After * anon_inode is created, bpf_link can't be just kfree()'d due to deferred * anon_inode's release() call. This helper marksbpf_link as @@ -2692,7 +2684,7 @@ void bpf_link_cleanup(struct bpf_link_primer *primer) { primer->link->prog = NULL; if (primer->id) { - bpf_link_free_id(primer->id); + bpf_free_obj_id(primer->obj_id, LINK_OBJ_ID); primer->id = 0; } fput(primer->file); @@ -2708,7 +2700,7 @@ void bpf_link_inc(struct bpf_link *link) static void bpf_link_free(struct bpf_link *link) { if (link->id) { - bpf_link_free_id(link->id); + bpf_free_obj_id(link->obj_id, LINK_OBJ_ID); link->id = 0; } if (link->prog) { @@ -2774,7 +2766,7 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) "link_type:\t%s\n" "link_id:\t%u\n", bpf_link_type_strs[link->type], - link->id); + bpf_obj_id_vnr(link->obj_id)); if (prog) { bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); seq_printf(m, @@ -2797,19 +2789,6 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) .write = bpf_dummy_write, }; -static int bpf_link_alloc_id(struct bpf_link *link) -{ - int id; - - idr_preload(GFP_KERNEL); - spin_lock_bh(&link_idr_lock); - id = idr_alloc_cyclic(&link_idr, link, 1, INT_MAX, GFP_ATOMIC); - spin_unlock_bh(&link_idr_lock); - idr_preload_end(); - - return id; -} - /* Prepare bpf_link to be exposed to user-space by allocating anon_inode file, * reserving unused FD and allocating ID from link_idr. This is to be paired * with bpf_link_settle() to install FD and ID and expose bpf_link to @@ -2825,23 +2804,23 @@ static int bpf_link_alloc_id(struct bpf_link *link) */ int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer) { + struct bpf_obj_id *obj_id; struct file *file; - int fd, id; + int fd; fd = get_unused_fd_flags(O_CLOEXEC); if (fd < 0) return fd; - - id = bpf_link_alloc_id(link); - if (id < 0) { + obj_id = bpf_alloc_obj_id(current->nsproxy->bpf_ns, link, LINK_OBJ_ID); + if (IS_ERR(obj_id)) { put_unused_fd(fd); - return id; + return PTR_ERR(obj_id); } file = anon_inode_getfile("bpf_link", &bpf_link_fops, link, O_CLOEXEC); if (IS_ERR(file)) { - bpf_link_free_id(id); + bpf_free_obj_id(obj_id, LINK_OBJ_ID); put_unused_fd(fd); return PTR_ERR(file); } @@ -2849,7 +2828,8 @@ int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer) primer->link = link; primer->file = file; primer->fd = fd; - primer->id = id; + primer->id = bpf_obj_id_nr(obj_id); + primer->obj_id = obj_id; return 0; } @@ -2858,6 +2838,7 @@ int bpf_link_settle(struct bpf_link_primer *primer) /* make bpf_link fetchable by ID */ spin_lock_bh(&link_idr_lock); primer->link->id = primer->id; + primer->link->obj_id = primer->obj_id; spin_unlock_bh(&link_idr_lock); /* make bpf_link fetchable by FD */ fd_install(primer->fd, primer->file); @@ -4265,7 +4246,7 @@ static int bpf_link_get_info_by_fd(struct file *file, return -EFAULT; info.type = link->type; - info.id = link->id; + info.id = bpf_obj_id_vnr(link->obj_id); if (link->prog) info.prog_id = bpf_obj_id_vnr(link->prog->aux->obj_id); @@ -4748,6 +4729,7 @@ static struct bpf_link *bpf_link_inc_not_zero(struct bpf_link *link) struct bpf_link *bpf_link_by_id(u32 id) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_link *link; if (!id) @@ -4755,7 +4737,7 @@ struct bpf_link *bpf_link_by_id(u32 id) spin_lock_bh(&link_idr_lock); /* before link is "settled", ID is 0, pretend it doesn't exist yet */ - link = idr_find(&link_idr, id); + link = idr_find(&ns->idr[LINK_OBJ_ID], id); if (link) { if (link->id) link = bpf_link_inc_not_zero(link); @@ -4770,11 +4752,12 @@ struct bpf_link *bpf_link_by_id(u32 id) struct bpf_link *bpf_link_get_curr_or_next(u32 *id) { + struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_link *link; spin_lock_bh(&link_idr_lock); again: - link = idr_get_next(&link_idr, id); + link = idr_get_next(&ns->idr[LINK_OBJ_ID], id); if (link) { link = bpf_link_inc_not_zero(link); if (IS_ERR(link)) { @@ -5086,7 +5069,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) break; case BPF_LINK_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &link_idr, &link_idr_lock); + &ns->idr[LINK_OBJ_ID], &link_idr_lock); break; case BPF_ENABLE_STATS: err = bpf_enable_stats(&attr); diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c index 1fd8ceb..e2237ad 100644 --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c @@ -36,7 +36,8 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type) case BPF_OBJ_BTF: return BPF_CORE_READ((struct btf *)ent, id); case BPF_OBJ_LINK: - return BPF_CORE_READ((struct bpf_link *)ent, id); + obj_id = BPF_CORE_READ((struct bpf_link *)ent, obj_id); + break; default: return 0; } From patchwork Sun Mar 26 09:22:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188019 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAD2FC74A5B for ; Sun, 26 Mar 2023 09:22:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbjCZJWv (ORCPT ); Sun, 26 Mar 2023 05:22:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231990AbjCZJWm (ORCPT ); Sun, 26 Mar 2023 05:22:42 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6050F9778; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id x1so5867903qtr.7; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jEt9rMCN/8rHzWzG6uXe7mNBDuuwgA3Px4yTSwTLem8=; b=Uq763ijSODYcUZCFGo67S6aCcVOADK/sW5Dhphk+A3w7DCkjTLE3pF/aqOMwlZOJ/e NTRke8IFf0+MSL1qxNnqFz/iFP8hN65vi+hsuhyAbf/D5GuqzYt3tc3VtVuz1Jm1H2Zu wy8odXBWigyiwb6xVj921Elg9aE4zfBRM2iTXwDBET2J2xDqQ4MZhDso/JojYggxaP7N sXtCT3qDkvU2yA9D/XxSC+Bes1Ckpji9D6vFx2wF9mST5N+oSXOt1S+7/0kqIvDQ+A7M Ij2VnMIWvehC+Om9PcFQNwRRYYDpE5JAL8bOCUfFuW5GlHFqoY75eZoO8dLMUh2FKZUy wBWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jEt9rMCN/8rHzWzG6uXe7mNBDuuwgA3Px4yTSwTLem8=; b=MzL/ppXaWW/3uJf8XBwHoKaDb0oIgALXRx9K1Ip4nPKa/GwscLhuuNqSd/ptv6VPtO UbLjxeJkMm1GlHS/xcm1gortBRfkbTdUBvghYeZ+IMdYHb2yT2+Vz0fM2+2ib2cl3oKp v+O8d0HO4ZinVrkPatvRUMDNj24wHCcIBwL6m+TKRffFzTthDTM4XILDXufzQ1kgnS+n HC0Oa2CYnP+sMZoZYQPWJYkAO7VyrecndWa0fs1a8Af2la0uvL1k/iIqTZw9GDonv5X3 sxGRDVg77j5E1pNOH6izR65ips1iOsX/Tw+m3WCSV5IGnBaxDaqp4z7Ox2N34ZYgUs+g k+Uw== X-Gm-Message-State: AO0yUKXGKlMKp81tJLUkfqNr5eF3a/FBZ+1HkkFBcV1mL49opjrN0b4X kItu2/qoOoMmsz91EEMxo7E= X-Google-Smtp-Source: AKy350axGSAtaUnJ8cD1yseHz0qq67FsBed/yK7GOazVxmcNTUFHjtBsY5jl3LMh3vdcC4gu0RdDDg== X-Received: by 2002:a05:622a:199a:b0:3bf:e4da:2367 with SMTP id u26-20020a05622a199a00b003bfe4da2367mr15703885qtc.3.1679822547030; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:26 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 11/13] bpf: Allow iterating bpf objects with CAP_BPF in bpf namespace Date: Sun, 26 Mar 2023 09:22:06 +0000 Message-Id: <20230326092208.13613-12-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC CAP_SYS_ADMIN is not required to iterate bpf objects if a user is in a non-init bpf namespace. The user can iterate bpf maps, progs, and links in his bpf namespace but can't iterate the bpf objects in different bpf namespace. Signed-off-by: Yafang Shao --- include/linux/bpf_namespace.h | 8 ++++++++ kernel/bpf/syscall.c | 10 +++++----- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_namespace.h b/include/linux/bpf_namespace.h index 50bd68c..f484791 100644 --- a/include/linux/bpf_namespace.h +++ b/include/linux/bpf_namespace.h @@ -5,6 +5,7 @@ #include #include #include +#include struct ubpf_obj_id { int nr; @@ -79,4 +80,11 @@ static inline int bpf_obj_id_vnr(struct bpf_obj_id *obj_id) { return bpf_obj_id_nr_ns(obj_id, current->nsproxy->bpf_ns); } + +static inline bool bpfns_capable(void) +{ + if (current->nsproxy->bpf_ns != &init_bpf_ns && capable(CAP_BPF)) + return true; + return false; +} #endif /* _LINUX_BPF_ID_NS_H */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 855d5f7..8a72694 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3628,7 +3628,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !bpfns_capable()) return -EPERM; next_id++; @@ -3712,7 +3712,7 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID)) return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !bpfns_capable()) return -EPERM; prog = bpf_prog_by_id(id); @@ -3740,7 +3740,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) attr->open_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !bpfns_capable()) return -EPERM; f_flags = bpf_get_file_flag(attr->open_flags); @@ -4386,7 +4386,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, if (CHECK_ATTR(BPF_TASK_FD_QUERY)) return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !bpfns_capable()) return -EPERM; if (attr->task_fd_query.flags != 0) @@ -4781,7 +4781,7 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !bpfns_capable()) return -EPERM; link = bpf_link_by_id(id); From patchwork Sun Mar 26 09:22:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188021 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91E3BC74A5B for ; Sun, 26 Mar 2023 09:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232075AbjCZJXI (ORCPT ); Sun, 26 Mar 2023 05:23:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232008AbjCZJWv (ORCPT ); Sun, 26 Mar 2023 05:22:51 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA2BA245; Sun, 26 Mar 2023 02:22:28 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id g19so5857127qts.9; Sun, 26 Mar 2023 02:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uSwgby0UrYndbAxTM77AgxLovcrhl/Mtkc5Vceujv1A=; b=T8x9POr0Lj1AqtVdVvWfyzSmtHtHn2wyWRjHAsTh+AtYmG4W8fa7xJrjXMtItajUm2 D7uC04UFqTWRyv2z88AjdBNPAVHbc2yH+jkkn1CRECrA592SOZUZypoupsbsCkV05HYO 8eRramfOGqMbt9kRg0hNCktEIaMYNdup6Kw0y8xFKl1BI+kM04G4Zgbak624ghy1/hSU TLgld8TVf8JeVlY3bYLrwPBN9L6YPIgT+e3u6X6qkE7YwbtdUPVDLLbdLqymkuSsb9pQ FxxW84SucUSkRKh1ccuFeLT9jIwFs76Nszp+20uI1o0Sqq7usuOQc7LpmlKeYlTveJzR kwIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uSwgby0UrYndbAxTM77AgxLovcrhl/Mtkc5Vceujv1A=; b=rlqZrJNYnPzSmprNkJr/8+EV1FP1eKMQ55JoB/kGFAle3sAiTtBhHwNDRhFNZ7GCNA QQlcxGFO1OCze2HD3ckUXu1KFhRmZuZlc8KRo0MujFQ18hhcK38eXOX7vxX0ITT/qk9N k/PghQciXQquytAKbFkPxCkecDGpA7s52x7iwYdIouL1vXFI53AsSlvOr+WXrqqnOHvr NS381C0T+4Zow5bPNHWmWCLEusVGVtUUWDb2tkMnnosUcoaV8qiRcfLaTQR4M1U5Plkb RgcP1qxwYPD87SINv9uMyYpXrCJp2K9mNT0jUAejOuDb+QZ+1K6znGlu9d5RvB3xFSj7 WPMg== X-Gm-Message-State: AO0yUKV3Gga8oqsQrrNoD25KL8O7retwWX5IYttORYH8Qp55tnvtBkNx q5PMFf8WkNdroIqd84+KRjE= X-Google-Smtp-Source: AK7set8gkEDowD9aPU7oLgh91OWH0kdZjlTpcpSbqOXtKxC2i3dKsI2PZ9nf7IIsIxs0s9Jf5mZqDQ== X-Received: by 2002:ac8:5a09:0:b0:3e3:9185:cb15 with SMTP id n9-20020ac85a09000000b003e39185cb15mr14418072qta.7.1679822547864; Sun, 26 Mar 2023 02:22:27 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:27 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 12/13] bpf: Use bpf_idr_lock array instead Date: Sun, 26 Mar 2023 09:22:07 +0000 Message-Id: <20230326092208.13613-13-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Use an array instead, that will make the code more clear. It is a cleanup. Signed-off-by: Yafang Shao --- include/linux/bpf_namespace.h | 4 +--- kernel/bpf/bpf_namespace.c | 37 ++++++------------------------------- kernel/bpf/syscall.c | 42 +++++++++++++++++++++--------------------- 3 files changed, 28 insertions(+), 55 deletions(-) diff --git a/include/linux/bpf_namespace.h b/include/linux/bpf_namespace.h index f484791..4d58986 100644 --- a/include/linux/bpf_namespace.h +++ b/include/linux/bpf_namespace.h @@ -39,9 +39,7 @@ struct bpf_namespace { extern struct bpf_namespace init_bpf_ns; extern struct proc_ns_operations bpfns_operations; -extern spinlock_t map_idr_lock; -extern spinlock_t prog_idr_lock; -extern spinlock_t link_idr_lock; +extern spinlock_t bpf_idr_lock[OBJ_ID_NUM]; struct bpf_namespace *copy_bpfns(unsigned long flags, struct user_namespace *user_ns, diff --git a/kernel/bpf/bpf_namespace.c b/kernel/bpf/bpf_namespace.c index c7d62ef..51c240f 100644 --- a/kernel/bpf/bpf_namespace.c +++ b/kernel/bpf/bpf_namespace.c @@ -11,9 +11,7 @@ #include #define MAX_BPF_NS_LEVEL 32 -DEFINE_SPINLOCK(map_idr_lock); -DEFINE_SPINLOCK(prog_idr_lock); -DEFINE_SPINLOCK(link_idr_lock); +spinlock_t bpf_idr_lock[OBJ_ID_NUM]; static struct kmem_cache *bpfns_cachep; static struct kmem_cache *obj_id_cache[MAX_PID_NS_LEVEL]; static struct ns_common *bpfns_get(struct task_struct *task); @@ -208,8 +206,10 @@ static void __init bpfns_idr_init(void) init_bpf_ns.obj_id_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); - for (i = 0; i < OBJ_ID_NUM; i++) + for (i = 0; i < OBJ_ID_NUM; i++) { idr_init(&init_bpf_ns.idr[i]); + spin_lock_init(&bpf_idr_lock[i]); + } } static __init int bpf_namespaces_init(void) @@ -231,24 +231,11 @@ struct bpf_obj_id *bpf_alloc_obj_id(struct bpf_namespace *ns, int id; int i; - switch (type) { - case MAP_OBJ_ID: - idr_lock = &map_idr_lock; - break; - case PROG_OBJ_ID: - idr_lock = &prog_idr_lock; - break; - case LINK_OBJ_ID: - idr_lock = &link_idr_lock; - break; - default: - return ERR_PTR(-EINVAL); - } - obj_id = kmem_cache_alloc(ns->obj_id_cachep, GFP_KERNEL); if (!obj_id) return ERR_PTR(-ENOMEM); + idr_lock = &bpf_idr_lock[type]; obj_id->level = ns->level; for (i = ns->level; i >= 0; i--) { idr_preload(GFP_KERNEL); @@ -283,19 +270,7 @@ void bpf_free_obj_id(struct bpf_obj_id *obj_id, int type) unsigned long flags; int i; - switch (type) { - case MAP_OBJ_ID: - idr_lock = &map_idr_lock; - break; - case PROG_OBJ_ID: - idr_lock = &prog_idr_lock; - break; - case LINK_OBJ_ID: - idr_lock = &link_idr_lock; - break; - default: - return; - } + idr_lock = &bpf_idr_lock[type]; /* Note that the level-0 should be freed at last */ for (i = obj_id->level; i >= 0; i--) { spin_lock_irqsave(idr_lock, flags); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8a72694..7cbaaa9 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1269,7 +1269,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) return map; } -/* map_idr_lock should have been held or the map should have been +/* map idr_lock should have been held or the map should have been * protected by rcu read lock. */ struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) @@ -1287,9 +1287,9 @@ struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map) { - spin_lock_bh(&map_idr_lock); + spin_lock_bh(&bpf_idr_lock[MAP_OBJ_ID]); map = __bpf_map_inc_not_zero(map, false); - spin_unlock_bh(&map_idr_lock); + spin_unlock_bh(&bpf_idr_lock[MAP_OBJ_ID]); return map; } @@ -2195,7 +2195,7 @@ void bpf_prog_inc(struct bpf_prog *prog) } EXPORT_SYMBOL_GPL(bpf_prog_inc); -/* prog_idr_lock should have been held */ +/* prog idr_lock should have been held */ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) { int refold; @@ -2836,10 +2836,10 @@ int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer) int bpf_link_settle(struct bpf_link_primer *primer) { /* make bpf_link fetchable by ID */ - spin_lock_bh(&link_idr_lock); + spin_lock_bh(&bpf_idr_lock[LINK_OBJ_ID]); primer->link->id = primer->id; primer->link->obj_id = primer->obj_id; - spin_unlock_bh(&link_idr_lock); + spin_unlock_bh(&bpf_idr_lock[LINK_OBJ_ID]); /* make bpf_link fetchable by FD */ fd_install(primer->fd, primer->file); /* pass through installed FD */ @@ -3648,7 +3648,7 @@ struct bpf_map *bpf_map_get_curr_or_next(u32 *id) struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_map *map; - spin_lock_bh(&map_idr_lock); + spin_lock_bh(&bpf_idr_lock[MAP_OBJ_ID]); again: map = idr_get_next(&ns->idr[MAP_OBJ_ID], id); if (map) { @@ -3658,7 +3658,7 @@ struct bpf_map *bpf_map_get_curr_or_next(u32 *id) goto again; } } - spin_unlock_bh(&map_idr_lock); + spin_unlock_bh(&bpf_idr_lock[MAP_OBJ_ID]); return map; } @@ -3668,7 +3668,7 @@ struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id) struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_prog *prog; - spin_lock_bh(&prog_idr_lock); + spin_lock_bh(&bpf_idr_lock[PROG_OBJ_ID]); again: prog = idr_get_next(&ns->idr[PROG_OBJ_ID], id); if (prog) { @@ -3678,7 +3678,7 @@ struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id) goto again; } } - spin_unlock_bh(&prog_idr_lock); + spin_unlock_bh(&bpf_idr_lock[PROG_OBJ_ID]); return prog; } @@ -3693,13 +3693,13 @@ struct bpf_prog *bpf_prog_by_id(u32 id) if (!id) return ERR_PTR(-ENOENT); - spin_lock_bh(&prog_idr_lock); + spin_lock_bh(&bpf_idr_lock[PROG_OBJ_ID]); prog = idr_find(&ns->idr[PROG_OBJ_ID], id); if (prog) prog = bpf_prog_inc_not_zero(prog); else prog = ERR_PTR(-ENOENT); - spin_unlock_bh(&prog_idr_lock); + spin_unlock_bh(&bpf_idr_lock[PROG_OBJ_ID]); return prog; } @@ -3747,13 +3747,13 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) if (f_flags < 0) return f_flags; - spin_lock_bh(&map_idr_lock); + spin_lock_bh(&bpf_idr_lock[MAP_OBJ_ID]); map = idr_find(&ns->idr[MAP_OBJ_ID], id); if (map) map = __bpf_map_inc_not_zero(map, true); else map = ERR_PTR(-ENOENT); - spin_unlock_bh(&map_idr_lock); + spin_unlock_bh(&bpf_idr_lock[MAP_OBJ_ID]); if (IS_ERR(map)) return PTR_ERR(map); @@ -4735,7 +4735,7 @@ struct bpf_link *bpf_link_by_id(u32 id) if (!id) return ERR_PTR(-ENOENT); - spin_lock_bh(&link_idr_lock); + spin_lock_bh(&bpf_idr_lock[LINK_OBJ_ID]); /* before link is "settled", ID is 0, pretend it doesn't exist yet */ link = idr_find(&ns->idr[LINK_OBJ_ID], id); if (link) { @@ -4746,7 +4746,7 @@ struct bpf_link *bpf_link_by_id(u32 id) } else { link = ERR_PTR(-ENOENT); } - spin_unlock_bh(&link_idr_lock); + spin_unlock_bh(&bpf_idr_lock[LINK_OBJ_ID]); return link; } @@ -4755,7 +4755,7 @@ struct bpf_link *bpf_link_get_curr_or_next(u32 *id) struct bpf_namespace *ns = current->nsproxy->bpf_ns; struct bpf_link *link; - spin_lock_bh(&link_idr_lock); + spin_lock_bh(&bpf_idr_lock[LINK_OBJ_ID]); again: link = idr_get_next(&ns->idr[LINK_OBJ_ID], id); if (link) { @@ -4765,7 +4765,7 @@ struct bpf_link *bpf_link_get_curr_or_next(u32 *id) goto again; } } - spin_unlock_bh(&link_idr_lock); + spin_unlock_bh(&bpf_idr_lock[LINK_OBJ_ID]); return link; } @@ -5011,11 +5011,11 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) break; case BPF_PROG_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &ns->idr[PROG_OBJ_ID], &prog_idr_lock); + &ns->idr[PROG_OBJ_ID], &bpf_idr_lock[PROG_OBJ_ID]); break; case BPF_MAP_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &ns->idr[MAP_OBJ_ID], &map_idr_lock); + &ns->idr[MAP_OBJ_ID], &bpf_idr_lock[MAP_OBJ_ID]); break; case BPF_BTF_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, @@ -5069,7 +5069,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) break; case BPF_LINK_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, - &ns->idr[LINK_OBJ_ID], &link_idr_lock); + &ns->idr[LINK_OBJ_ID], &bpf_idr_lock[LINK_OBJ_ID]); break; case BPF_ENABLE_STATS: err = bpf_enable_stats(&attr); From patchwork Sun Mar 26 09:22:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13188022 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14EE3C77B60 for ; Sun, 26 Mar 2023 09:23:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231958AbjCZJXJ (ORCPT ); Sun, 26 Mar 2023 05:23:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231959AbjCZJWv (ORCPT ); Sun, 26 Mar 2023 05:22:51 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7C1EA271; Sun, 26 Mar 2023 02:22:30 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id ga7so5890736qtb.2; Sun, 26 Mar 2023 02:22:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679822548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=O7x7mlObk0mq6o5G4acCq3J/Q3Teazds0AbiZGRGfrs=; b=jH3+gOe0IubDUQiVblq9CC8G+sjt1vhQU+uyAo7rRHDtbXyRUrTQxeklq+IOr53MbF +JYKWg1fZnDvfmrwc+gsil+GBkuqcn2x09nVlYt0LXju/gaIOUm3TRiZfMcRNEKDLmY2 sHZTTsTM7IQQuu7BP0hWxOYAsdg1RiD4HM8lNSWRZc2DiOY9zbVyy4viinugjDtW5OeL HBXxrunxHvVUw08WRg41zi+FecGZpIJ+EJTWpWdbOYwQ06eKSTp0a7ERHLAkglIRyWBf MZrumVFQ/Gt3TqWo47rtHv7aWHQZXeacEhaVZi2bRBHmgiXq+ykd/ZtjMXYCTsaTzyvz vKZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679822548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O7x7mlObk0mq6o5G4acCq3J/Q3Teazds0AbiZGRGfrs=; b=3tSr0Pb7PiLPyRM8+r+jfKfjhskfqOO1gQjF7l55tQ4NxIyYAPsmqqjiHJ5lfEoRum C5ARxjqq8PpAIixfa/P8Koh7UuADOzIUk3w3Ri7thPID4G889z8Myl/7f6Eli0f9L7xE EwmFcuBL75X0hiAsH2f/xEUGIClYVhDYIYDl+29uJYJxnbsucCDuOI1Bh/BkpVAIjEJP ZPVDaEoIrstcu39qAd8hZ/TffdaC/s0ikBvu8SQKRd7jzJ5vBAmUt7OytvW1AFDHBmuV 4VqW51gHKa1cC3UWAJSdZH8fStRT6yFykSOL8tCoin/o6vqyu9dByljSVIWlKRxAr2Z5 r8Ew== X-Gm-Message-State: AAQBX9fsiALRR3nnyX/VR6BN8BgizeZecIbj8xwaNdq5JsmQXP28Fgno NBJePNfP2vLooIUNOSHNMEs= X-Google-Smtp-Source: AKy350ZWZWKF7vBQ3o37aJcdEl7aHQL8ZhsChADAhPyug9b+hHQnb7kOo9oRsLXo54i0B5djN2emww== X-Received: by 2002:a05:622a:494:b0:3e4:e5bf:a24c with SMTP id p20-20020a05622a049400b003e4e5bfa24cmr1497323qtx.62.1679822548595; Sun, 26 Mar 2023 02:22:28 -0700 (PDT) Received: from vultr.guest ([2001:19f0:1000:1a1f:5400:4ff:fe5e:1d32]) by smtp.gmail.com with ESMTPSA id y5-20020ac87085000000b003e014845d9esm10257987qto.74.2023.03.26.02.22.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Mar 2023 02:22:28 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 13/13] selftests/bpf: Add selftest for bpf namespace Date: Sun, 26 Mar 2023 09:22:08 +0000 Message-Id: <20230326092208.13613-14-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230326092208.13613-1-laoar.shao@gmail.com> References: <20230326092208.13613-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC A simple test case is added for the newly introduced bpf namespcae. Signed-off-by: Yafang Shao --- tools/testing/selftests/bpf/Makefile | 3 +- tools/testing/selftests/bpf/test_bpfns.c | 76 ++++++++++++++++++++++++++++++++ 2 files changed, 78 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/test_bpfns.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 4a8ef11..55f0aeb 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -40,7 +40,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test test_sock test_sockmap get_cgroup_id_user \ test_cgroup_storage \ test_tcpnotify_user test_sysctl \ - test_progs-no_alu32 + test_progs-no_alu32 test_bpfns # Also test bpf-gcc, if present ifneq ($(BPF_GCC),) @@ -255,6 +255,7 @@ $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS) $(OUTPUT)/test_maps: $(TESTING_HELPERS) $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(UNPRIV_HELPERS) $(OUTPUT)/xsk.o: $(BPFOBJ) +$(OUTPUT)/test_bpfns: $(TESTING_HELPERS) BPFTOOL ?= $(DEFAULT_BPFTOOL) $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ diff --git a/tools/testing/selftests/bpf/test_bpfns.c b/tools/testing/selftests/bpf/test_bpfns.c new file mode 100644 index 0000000..7baebe2 --- /dev/null +++ b/tools/testing/selftests/bpf/test_bpfns.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _GNU_SOURCE +#define _GNU_SOURCE 1 +#endif +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +static int create_bpf_map(const char *name) +{ + static struct bpf_map_create_opts map_opts = { + .sz = sizeof(map_opts), + }; + unsigned int value; + unsigned int key; + int map_fd; + + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, name, sizeof(key), + sizeof(value), 1, &map_opts); + if (map_fd < 0) + fprintf(stderr, "%s - Failed to create map\n", strerror(errno)); + return map_fd; +} + + +int main(int argc, char *argv[]) +{ + struct bpf_map_info info = {}; + __u32 info_len = sizeof(info); + struct clone_args args = { + .flags = 0x400000000ULL, /* CLONE_NEWBPF */ + .exit_signal = SIGCHLD, + }; + int map_fd, child_map_fd; + pid_t pid; + + /* Create a map in init bpf namespace. */ + map_fd = create_bpf_map("map_in_init"); + if (map_fd < 0) + exit(EXIT_FAILURE); + pid = syscall(__NR_clone3, &args, sizeof(struct clone_args)); + if (pid < 0) { + fprintf(stderr, "%s - Failed to create new process\n", strerror(errno)); + exit(EXIT_FAILURE); + } + + if (pid == 0) { + struct bpf_map_info info = {}; + + /* In a new bpf namespace, it is the first map. */ + child_map_fd = create_bpf_map("map_in_bpfns"); + if (child_map_fd < 0) + exit(EXIT_FAILURE); + bpf_obj_get_info_by_fd(child_map_fd, &info, &info_len); + assert(info.id == 1); + exit(EXIT_SUCCESS); + } + + if (waitpid(pid, NULL, 0) != pid) { + fprintf(stderr, "Failed to wait on child process\n"); + exit(EXIT_FAILURE); + } + + return 0; +}