From patchwork Mon Aug 14 17:28:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353129 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03DE6C2FF for ; Mon, 14 Aug 2023 17:28:29 +0000 (UTC) Received: from 66-220-155-179.mail-mxout.facebook.com (66-220-155-179.mail-mxout.facebook.com [66.220.155.179]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAE75133 for ; Mon, 14 Aug 2023 10:28:27 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id F197024C21F38; Mon, 14 Aug 2023 10:28:14 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Date: Mon, 14 Aug 2023 10:28:14 -0700 Message-Id: <20230814172814.1362339-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net This is needed for later percpu mem allocation when the allocation is done by bpf program. For such cases, a global bpf_global_percpu_ma is added where a flexible allocation size is needed. Signed-off-by: Yonghong Song --- include/linux/bpf.h | 4 ++-- kernel/bpf/core.c | 8 +++++--- kernel/bpf/memalloc.c | 14 ++++++-------- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb..60e80e90c37d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -55,8 +55,8 @@ struct cgroup; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; extern struct kobject *btf_kobj; -extern struct bpf_mem_alloc bpf_global_ma; -extern bool bpf_global_ma_set; +extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma; +extern bool bpf_global_ma_set, bpf_global_percpu_ma_set; typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 0f8f036d8bd1..95599df82ee4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -64,8 +64,8 @@ #define OFF insn->off #define IMM insn->imm -struct bpf_mem_alloc bpf_global_ma; -bool bpf_global_ma_set; +struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma; +bool bpf_global_ma_set, bpf_global_percpu_ma_set; /* No hurry in this branch * @@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void) ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false); bpf_global_ma_set = !ret; - return ret; + ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true); + bpf_global_percpu_ma_set = !ret; + return !bpf_global_ma_set || !bpf_global_percpu_ma_set; } late_initcall(bpf_global_ma_init); #endif diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 9c49ae53deaf..cb60445de98a 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) struct obj_cgroup *objcg = NULL; int cpu, i, unit_size, percpu_size = 0; + /* room for llist_node and per-cpu pointer */ + if (percpu) + percpu_size = LLIST_NODE_SZ + sizeof(void *); + if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - if (percpu) - /* room for llist_node and per-cpu pointer */ - percpu_size = LLIST_NODE_SZ + sizeof(void *); - else + if (!percpu) size += LLIST_NODE_SZ; /* room for llist_node */ unit_size = size; @@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) return 0; } - /* size == 0 && percpu is an invalid combination */ - if (WARN_ON_ONCE(percpu)) - return -EINVAL; - pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; @@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) c = &cc->cache[i]; c->unit_size = sizes[i]; c->objcg = objcg; + c->percpu_size = percpu_size; c->tgt = c; prefill_mem_cache(c, cpu); } From patchwork Mon Aug 14 17:28:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353130 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4338AC2FF for ; Mon, 14 Aug 2023 17:28:35 +0000 (UTC) Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8772133 for ; Mon, 14 Aug 2023 10:28:33 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 26A5B24C21F5B; Mon, 14 Aug 2023 10:28:20 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Date: Mon, 14 Aug 2023 10:28:20 -0700 Message-Id: <20230814172820.1362751-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net BPF_KPTR_PERCPU_REF represents a percpu field type like below struct val_t { ... fields ... }; struct t { ... struct val_t __percpu *percpu_data_ptr; ... }; where #define __percpu __attribute__((btf_type_tag("percpu"))) While BPF_KPTR_REF points to a trusted kernel object or a trusted local object, BPF_KPTR_PERCPU_REF points to a trusted local percpu object. This patch added basic support for BPF_KPTR_PERCPU_REF related to percpu field parsing, recording and free operations. BPF_KPTR_PERCPU_REF also supports the same map types as BPF_KPTR_REF does. Note that unlike a local kptr, it is possible that a BPF_KTPR_PERCUP_REF struct may not contain any special fields like other kptr, bpf_spin_lock, bpf_list_head, etc. Signed-off-by: Yonghong Song --- include/linux/bpf.h | 18 ++++++++++++------ kernel/bpf/btf.c | 5 +++++ kernel/bpf/syscall.c | 7 ++++++- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 60e80e90c37d..e6348fd0a785 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -180,14 +180,15 @@ enum btf_field_type { BPF_TIMER = (1 << 1), BPF_KPTR_UNREF = (1 << 2), BPF_KPTR_REF = (1 << 3), - BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, - BPF_LIST_HEAD = (1 << 4), - BPF_LIST_NODE = (1 << 5), - BPF_RB_ROOT = (1 << 6), - BPF_RB_NODE = (1 << 7), + BPF_KPTR_PERCPU_REF = (1 << 4), + BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_PERCPU_REF, + BPF_LIST_HEAD = (1 << 5), + BPF_LIST_NODE = (1 << 6), + BPF_RB_ROOT = (1 << 7), + BPF_RB_NODE = (1 << 8), BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD | BPF_RB_NODE | BPF_RB_ROOT, - BPF_REFCOUNT = (1 << 8), + BPF_REFCOUNT = (1 << 9), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -300,6 +301,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return "kptr"; + case BPF_KPTR_PERCPU_REF: + return "kptr_percpu"; case BPF_LIST_HEAD: return "bpf_list_head"; case BPF_LIST_NODE: @@ -325,6 +328,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type) return sizeof(struct bpf_timer); case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: return sizeof(u64); case BPF_LIST_HEAD: return sizeof(struct bpf_list_head); @@ -351,6 +355,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type) return __alignof__(struct bpf_timer); case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: return __alignof__(u64); case BPF_LIST_HEAD: return __alignof__(struct bpf_list_head); @@ -389,6 +394,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_TIMER: case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: break; default: WARN_ON_ONCE(1); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 249657c466dd..bc1792edc64e 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3293,6 +3293,8 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t, type = BPF_KPTR_UNREF; else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off))) type = BPF_KPTR_REF; + else if (!strcmp("percpu", __btf_name_by_offset(btf, t->name_off))) + type = BPF_KPTR_PERCPU_REF; else return -EINVAL; @@ -3457,6 +3459,7 @@ static int btf_find_struct_field(const struct btf *btf, break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: ret = btf_find_kptr(btf, member_type, off, sz, idx < info_cnt ? &info[idx] : &tmp); if (ret < 0) @@ -3523,6 +3526,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: ret = btf_find_kptr(btf, var_type, off, sz, idx < info_cnt ? &info[idx] : &tmp); if (ret < 0) @@ -3783,6 +3787,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]); if (ret < 0) goto end; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7f4e8c357a6a..1c30b6ee84d4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -514,6 +514,7 @@ void btf_record_free(struct btf_record *rec) switch (rec->fields[i].type) { case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: if (rec->fields[i].kptr.module) module_put(rec->fields[i].kptr.module); btf_put(rec->fields[i].kptr.btf); @@ -560,6 +561,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) switch (fields[i].type) { case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: btf_get(fields[i].kptr.btf); if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) { ret = -ENXIO; @@ -650,6 +652,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) WRITE_ONCE(*(u64 *)field_ptr, 0); break; case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0); if (!xchgd_field) break; @@ -657,7 +660,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) if (!btf_is_kernel(field->kptr.btf)) { pointee_struct_meta = btf_find_struct_meta(field->kptr.btf, field->kptr.btf_id); - WARN_ON_ONCE(!pointee_struct_meta); + if (field->type != BPF_KPTR_PERCPU_REF) + WARN_ON_ONCE(!pointee_struct_meta); migrate_disable(); __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ? pointee_struct_meta->record : @@ -1046,6 +1050,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: case BPF_REFCOUNT: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_PERCPU_HASH && From patchwork Mon Aug 14 17:28:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353131 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE010C2FF for ; Mon, 14 Aug 2023 17:28:40 +0000 (UTC) Received: from 69-171-232-180.mail-mxout.facebook.com (69-171-232-180.mail-mxout.facebook.com [69.171.232.180]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51C6610D5 for ; Mon, 14 Aug 2023 10:28:38 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 520A424C21F81; Mon, 14 Aug 2023 10:28:25 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Date: Mon, 14 Aug 2023 10:28:25 -0700 Message-Id: <20230814172825.1363378-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Add two new kfunc's, bpf_percpu_obj_new_impl() and bpf_percpu_obj_drop_impl(), to allocate a percpu obj. Two functions are very similar to bpf_obj_new_impl() and bpf_obj_drop_impl(). The major difference is related to percpu handling. bpf_rcu_read_lock() struct val_t __percpu *v = map_val->percpu_data; ... bpf_rcu_read_unlock() For a percpu data map_val like above 'v', the reg->type is set as PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU if inside rcu critical section. MEM_RCU marking here is similar to NON_OWN_REF as 'v' is not a owning referenace. But NON_OWN_REF is trusted and typically inside the spinlock while MEM_RCU is under rcu read lock. RCU is preferred here since percpu data structures mean potential concurrent access into its contents. Also, bpf_percpu_obj_new_impl() is restricted to only accept scalar struct which means nested kptr's are not allowed but some other special field, e.g., bpf_list_head, bpf_spin_lock, etc. could be nested (nested 'struct'). Later patch will improve verifier to handle such nested special fields. Signed-off-by: Yonghong Song --- include/linux/bpf.h | 3 +- kernel/bpf/helpers.c | 49 +++++++++++++++++++++++ kernel/bpf/syscall.c | 21 +++++++--- kernel/bpf/verifier.c | 90 ++++++++++++++++++++++++++++++++++--------- 4 files changed, 137 insertions(+), 26 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e6348fd0a785..a2cb380c43c7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -197,7 +197,8 @@ struct btf_field_kptr { struct btf *btf; struct module *module; /* dtor used if btf_is_kernel(btf), otherwise the type is - * program-allocated, dtor is NULL, and __bpf_obj_drop_impl is used + * program-allocated, dtor is NULL, and __bpf_obj_drop_impl + * or __bpf_percpu_drop_impl is used */ btf_dtor_kfunc_t dtor; u32 btf_id; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index eb91cae0612a..dd14cb7da4af 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1900,6 +1900,29 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) return p; } +__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign) +{ + struct btf_struct_meta *meta = meta__ign; + const struct btf_record *rec; + u64 size = local_type_id__k; + void __percpu *pptr; + void *p; + int cpu; + + p = bpf_mem_alloc(&bpf_global_percpu_ma, size); + if (!p) + return NULL; + if (meta) { + pptr = *((void __percpu **)p); + rec = meta->record; + for_each_possible_cpu(cpu) { + bpf_obj_init(rec, per_cpu_ptr(pptr, cpu)); + } + } + + return p; +} + /* Must be called under migrate_disable(), as required by bpf_mem_free */ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec) { @@ -1924,6 +1947,30 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign) __bpf_obj_drop_impl(p, meta ? meta->record : NULL); } +/* Must be called under migrate_disable(), as required by bpf_mem_free_rcu */ +void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec) +{ + void __percpu *pptr; + int cpu; + + if (rec) { + pptr = *((void __percpu **)p); + for_each_possible_cpu(cpu) { + bpf_obj_free_fields(rec, per_cpu_ptr(pptr, cpu)); + } + } + + bpf_mem_free_rcu(&bpf_global_percpu_ma, p); +} + +__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign) +{ + struct btf_struct_meta *meta = meta__ign; + void *p = p__alloc; + + __bpf_percpu_obj_drop_impl(p, meta ? meta->record : NULL); +} + __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign) { struct btf_struct_meta *meta = meta__ign; @@ -2436,7 +2483,9 @@ BTF_SET8_START(generic_btf_ids) BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) #endif BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE) BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_list_push_front_impl) BTF_ID_FLAGS(func, bpf_list_push_back_impl) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1c30b6ee84d4..9ceb6fd9a0e2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -627,6 +627,7 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj) } extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec); +extern void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec); void bpf_obj_free_fields(const struct btf_record *rec, void *obj) { @@ -660,13 +661,21 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) if (!btf_is_kernel(field->kptr.btf)) { pointee_struct_meta = btf_find_struct_meta(field->kptr.btf, field->kptr.btf_id); - if (field->type != BPF_KPTR_PERCPU_REF) + + if (field->type == BPF_KPTR_PERCPU_REF) { + migrate_disable(); + __bpf_percpu_obj_drop_impl(xchgd_field, pointee_struct_meta ? + pointee_struct_meta->record : + NULL); + migrate_enable(); + } else { WARN_ON_ONCE(!pointee_struct_meta); - migrate_disable(); - __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ? - pointee_struct_meta->record : - NULL); - migrate_enable(); + migrate_disable(); + __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ? + pointee_struct_meta->record : + NULL); + migrate_enable(); + } } else { field->kptr.dtor(xchgd_field); } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4ccca1f6c998..a985fbf18a11 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta { /* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling, * generally to pass info about user-defined local kptr types to later * verification logic - * bpf_obj_drop + * bpf_obj_drop/bpf_percpu_obj_drop * Record the local kptr type to be drop'd * bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type) * Record the local kptr type to be refcount_incr'd and use @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, if (kptr_field->type == BPF_KPTR_UNREF) perm_flags |= PTR_UNTRUSTED; + if (kptr_field->type == BPF_KPTR_PERCPU_REF) + perm_flags |= MEM_PERCPU | MEM_ALLOC; + if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags)) goto bad_type; - if (!btf_is_kernel(reg->btf)) { + if (kptr_field->type != BPF_KPTR_PERCPU_REF && !btf_is_kernel(reg->btf)) { verbose(env, "R%d must point to kernel BTF\n", regno); return -EINVAL; } + if (kptr_field->type == BPF_KPTR_PERCPU_REF && btf_is_kernel(reg->btf)) { + verbose(env, "R%d must point to prog BTF\n", regno); + return -EINVAL; + } /* We need to verify reg->type and reg->btf, before accessing reg->btf */ reg_name = btf_type_name(reg->btf, reg->btf_id); @@ -5084,7 +5091,17 @@ static bool rcu_safe_kptr(const struct btf_field *field) { const struct btf_field_kptr *kptr = &field->kptr; - return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id); + return field->type == BPF_KPTR_PERCPU_REF || + (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id)); +} + +static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field) +{ + if (!rcu_safe_kptr(kptr_field) || !in_rcu_cs(env)) + return PTR_MAYBE_NULL | PTR_UNTRUSTED; + if (kptr_field->type != BPF_KPTR_PERCPU_REF) + return PTR_MAYBE_NULL | MEM_RCU; + return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU; } static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, @@ -5110,7 +5127,8 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, /* We only allow loading referenced kptr, since it will be marked as * untrusted, similar to unreferenced kptr. */ - if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) { + if (class != BPF_LDX && + (kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_PERCPU_REF)) { verbose(env, "store to referenced kptr disallowed\n"); return -EACCES; } @@ -5121,10 +5139,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, * value from map as PTR_TO_BTF_ID, with the correct type. */ mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf, - kptr_field->kptr.btf_id, - rcu_safe_kptr(kptr_field) && in_rcu_cs(env) ? - PTR_MAYBE_NULL | MEM_RCU : - PTR_MAYBE_NULL | PTR_UNTRUSTED); + kptr_field->kptr.btf_id, btf_ld_kptr_type(env, kptr_field)); /* For mark_ptr_or_null_reg */ val_reg->id = ++env->id_gen; } else if (class == BPF_STX) { @@ -5178,6 +5193,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, switch (field->type) { case BPF_KPTR_UNREF: case BPF_KPTR_REF: + case BPF_KPTR_PERCPU_REF: if (src != ACCESS_DIRECT) { verbose(env, "kptr cannot be accessed indirectly by helper\n"); return -EACCES; @@ -7316,7 +7332,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno, verbose(env, "off=%d doesn't point to kptr\n", kptr_off); return -EACCES; } - if (kptr_field->type != BPF_KPTR_REF) { + if (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_PERCPU_REF) { verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off); return -EACCES; } @@ -7827,8 +7843,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, if (base_type(arg_type) == ARG_PTR_TO_MEM) type &= ~DYNPTR_TYPE_FLAG_MASK; - if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type)) + if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type)) { type &= ~MEM_ALLOC; + type &= ~MEM_PERCPU; + } for (i = 0; i < ARRAY_SIZE(compatible->types); i++) { expected = compatible->types[i]; @@ -7918,6 +7936,14 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, } /* Handled by helper specific checks */ break; + case PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC: + if (meta->func_id != BPF_FUNC_kptr_xchg) { + verbose(env, "verifier internal error: unimplemented handling of MEM_PERCPU | MEM_ALLOC\n"); + return -EFAULT; + } + if (map_kptr_match_type(env, meta->kptr_field, reg, regno)) + return -EACCES; + break; case PTR_TO_BTF_ID | MEM_PERCPU: case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED: /* Handled by helper specific checks */ @@ -9885,8 +9911,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (func_id == BPF_FUNC_kptr_xchg) { ret_btf = meta.kptr_field->kptr.btf; ret_btf_id = meta.kptr_field->kptr.btf_id; - if (!btf_is_kernel(ret_btf)) + if (!btf_is_kernel(ret_btf)) { regs[BPF_REG_0].type |= MEM_ALLOC; + if (meta.kptr_field->type == BPF_KPTR_PERCPU_REF) + regs[BPF_REG_0].type |= MEM_PERCPU; + } } else { if (fn->ret_btf_id == BPF_PTR_POISON) { verbose(env, "verifier internal error:"); @@ -10271,6 +10300,8 @@ enum special_kfunc_type { KF_bpf_dynptr_slice, KF_bpf_dynptr_slice_rdwr, KF_bpf_dynptr_clone, + KF_bpf_percpu_obj_new_impl, + KF_bpf_percpu_obj_drop_impl, }; BTF_SET_START(special_kfunc_set) @@ -10291,6 +10322,8 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_percpu_obj_new_impl) +BTF_ID(func, bpf_percpu_obj_drop_impl) BTF_SET_END(special_kfunc_set) BTF_ID_LIST(special_kfunc_list) @@ -10313,6 +10346,8 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_percpu_obj_new_impl) +BTF_ID(func, bpf_percpu_obj_drop_impl) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -11003,7 +11038,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ } break; case KF_ARG_PTR_TO_ALLOC_BTF_ID: - if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) && + reg->type != (PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC)) { verbose(env, "arg#%d expected pointer to allocated object\n", i); return -EINVAL; } @@ -11012,7 +11048,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return -EINVAL; } if (meta->btf == btf_vmlinux && - meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) { + (meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl] || + meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl])) { meta->arg_btf = reg->btf; meta->arg_btf_id = reg->btf_id; } @@ -11410,6 +11447,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, /* Only exception is bpf_obj_new_impl */ if (meta.btf != btf_vmlinux || (meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl] && + meta.func_id != special_kfunc_list[KF_bpf_percpu_obj_new_impl] && meta.func_id != special_kfunc_list[KF_bpf_refcount_acquire_impl])) { verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n"); return -EINVAL; @@ -11423,11 +11461,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) { - if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) { + if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] || + meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { struct btf *ret_btf; u32 ret_btf_id; - if (unlikely(!bpf_global_ma_set)) + if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set) + return -ENOMEM; + + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set) return -ENOMEM; if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) { @@ -11440,13 +11482,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, /* This may be NULL due to user not supplying a BTF */ if (!ret_btf) { - verbose(env, "bpf_obj_new requires prog BTF\n"); + verbose(env, "bpf_obj_new/bpf_percpu_obj_new requires prog BTF\n"); return -EINVAL; } ret_t = btf_type_by_id(ret_btf, ret_btf_id); if (!ret_t || !__btf_type_is_struct(ret_t)) { - verbose(env, "bpf_obj_new type ID argument must be of a struct\n"); + verbose(env, "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct\n"); + return -EINVAL; + } + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && + !__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) { + verbose(env, "bpf_percpu_obj_new type ID argument must be of a struct of scalars\n"); return -EINVAL; } @@ -11454,6 +11501,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; regs[BPF_REG_0].btf = ret_btf; regs[BPF_REG_0].btf_id = ret_btf_id; + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) + regs[BPF_REG_0].type |= MEM_PERCPU; insn_aux->obj_new_size = ret_t->size; insn_aux->kptr_struct_meta = @@ -11594,7 +11643,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, regs[BPF_REG_0].id = ++env->id_gen; } else if (btf_type_is_void(t)) { if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) { - if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) { + if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl] || + meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) { insn_aux->kptr_struct_meta = btf_find_struct_meta(meta.arg_btf, meta.arg_btf_id); @@ -18266,7 +18316,8 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn->imm = BPF_CALL_IMM(desc->addr); if (insn->off) return 0; - if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) { + if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl] || + desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) }; u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size; @@ -18277,6 +18328,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_buf[3] = *insn; *cnt = 4; } else if (desc->func_id == special_kfunc_list[KF_bpf_obj_drop_impl] || + desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl] || desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) { struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) }; From patchwork Mon Aug 14 17:28:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353132 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C71BC2FF for ; Mon, 14 Aug 2023 17:28:46 +0000 (UTC) Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD9BE10DE for ; Mon, 14 Aug 2023 10:28:44 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 785CD24C21FCE; Mon, 14 Aug 2023 10:28:30 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Date: Mon, 14 Aug 2023 10:28:30 -0700 Message-Id: <20230814172830.1363918-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed for allocated percpu objects. For an allocated percpu obj, the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'. The return type for these two re-purposed helpera is 'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'. The MEM_ALLOC allows that the per-cpu data can be read and written. Since the memory allocator bpf_mem_alloc() returns a ptr to a percpu ptr for percpu data, the first argument of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched with a dereference before passing to the helper func. Signed-off-by: Yonghong Song --- include/linux/bpf_verifier.h | 1 + kernel/bpf/verifier.c | 51 ++++++++++++++++++++++++++++++------ 2 files changed, 44 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index f70f9ac884d2..e23480db37ec 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -480,6 +480,7 @@ struct bpf_insn_aux_data { bool zext_dst; /* this insn zero extends dst reg */ bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */ bool is_iter_next; /* bpf_iter__next() kfunc call */ + bool percpu_ptr_prog_alloc; /* {this,per}_cpu_ptr() with prog alloc */ u8 alu_state; /* used in combination with alu_limit */ /* below fields are initialized once */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a985fbf18a11..6fc200cb68b6 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6221,7 +6221,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, } if (type_is_alloc(reg->type) && !type_is_non_owning_ref(reg->type) && - !reg->ref_obj_id) { + !(reg->type & MEM_RCU) && !reg->ref_obj_id) { verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); return -EFAULT; } @@ -7765,6 +7765,7 @@ static const struct bpf_reg_types btf_ptr_types = { static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU, + PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU, PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED, } }; @@ -7945,6 +7946,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, return -EACCES; break; case PTR_TO_BTF_ID | MEM_PERCPU: + case PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU: case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED: /* Handled by helper specific checks */ break; @@ -8287,6 +8289,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "Helper has invalid btf_id in R%d\n", regno); return -EACCES; } + if (reg->type & MEM_RCU) { + const struct btf_type *type = btf_type_by_id(reg->btf, reg->btf_id); + + if (!type || !btf_type_is_struct(type)) { + verbose(env, "Helper has invalid btf/btf_id in R%d\n", regno); + return -EFAULT; + } + env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc = true; + } + meta->ret_btf = reg->btf; meta->ret_btf_id = reg->btf_id; break; @@ -9888,14 +9900,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag; regs[BPF_REG_0].mem_size = tsize; } else { - /* MEM_RDONLY may be carried from ret_flag, but it - * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise - * it will confuse the check of PTR_TO_BTF_ID in - * check_mem_access(). - */ - ret_flag &= ~MEM_RDONLY; + if (env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc) { + regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU; + } else { + /* MEM_RDONLY may be carried from ret_flag, but it + * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise + * it will confuse the check of PTR_TO_BTF_ID in + * check_mem_access(). + */ + ret_flag &= ~MEM_RDONLY; + regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag; + } - regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag; regs[BPF_REG_0].btf = meta.ret_btf; regs[BPF_REG_0].btf_id = meta.ret_btf_id; } @@ -18646,6 +18662,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_call_imm; } + /* bpf_per_cpu_ptr() and bpf_this_cpu_ptr() */ + if (env->insn_aux_data[i + delta].percpu_ptr_prog_alloc) { + /* patch with 'r1 = *(u64 *)(r1 + 0)' since for percpu data, + * bpf_mem_alloc() returns a ptr to the percpu data ptr. + */ + insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0); + insn_buf[1] = *insn; + cnt = 2; + + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + goto patch_call_imm; + } + /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup * and other inlining handlers are currently limited to 64 bit * only. From patchwork Mon Aug 14 17:28:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353134 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22015C8F1 for ; Mon, 14 Aug 2023 17:28:52 +0000 (UTC) Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D969D10DD for ; Mon, 14 Aug 2023 10:28:50 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 695BE24C21FF8; Mon, 14 Aug 2023 10:28:35 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test Date: Mon, 14 Aug 2023 10:28:35 -0700 Message-Id: <20230814172835.1364448-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Some error messages are changed due to the addition of percpu kptr support. Fix linked_list test with changed error messages. Signed-off-by: Yonghong Song --- tools/testing/selftests/bpf/prog_tests/linked_list.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c index 18cf7b17463d..db3bf6bbe01a 100644 --- a/tools/testing/selftests/bpf/prog_tests/linked_list.c +++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c @@ -65,8 +65,8 @@ static struct { { "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" }, { "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" }, { "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" }, - { "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" }, - { "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" }, + { "obj_new_no_composite", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" }, + { "obj_new_no_struct", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" }, { "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" }, { "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" }, { "obj_new_acq", "Unreleased reference id=" }, From patchwork Mon Aug 14 17:28:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353133 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64FB6CA4A for ; Mon, 14 Aug 2023 17:28:46 +0000 (UTC) Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A7DB10D5 for ; Mon, 14 Aug 2023 10:28:45 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 9019824C22008; Mon, 14 Aug 2023 10:28:41 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition Date: Mon, 14 Aug 2023 10:28:41 -0700 Message-Id: <20230814172841.1364680-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Add __percpu macro definition in bpf_helpers.h. Signed-off-by: Yonghong Song --- tools/lib/bpf/bpf_helpers.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index bbab9ad9dc5a..30f050c6b106 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -181,6 +181,7 @@ enum libbpf_tristate { #define __ksym __attribute__((section(".ksyms"))) #define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted"))) #define __kptr __attribute__((btf_type_tag("kptr"))) +#define __percpu __attribute__((btf_type_tag("percpu"))) #define bpf_ksym_exists(sym) ({ \ _Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak"); \ From patchwork Mon Aug 14 17:28:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353135 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A777C8F1 for ; Mon, 14 Aug 2023 17:29:00 +0000 (UTC) Received: from 69-171-232-180.mail-mxout.facebook.com (69-171-232-180.mail-mxout.facebook.com [69.171.232.180]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2846133 for ; Mon, 14 Aug 2023 10:28:58 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id B6BFA24C22026; Mon, 14 Aug 2023 10:28:46 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Date: Mon, 14 Aug 2023 10:28:46 -0700 Message-Id: <20230814172846.1365407-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net The new macro bpf_percpu_obj_{new/drop}() is very similar to bpf_obj_{new,drop}() as they both take a type as the argument. Signed-off-by: Yonghong Song --- .../testing/selftests/bpf/bpf_experimental.h | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 209811b1993a..9c5fe11520ae 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -131,4 +131,35 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod */ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym; +/* Description + * Allocates a percpu object of the type represented by 'local_type_id' in + * program BTF. User may use the bpf_core_type_id_local macro to pass the + * type ID of a struct in program BTF. + * + * The 'local_type_id' parameter must be a known constant. + * The 'meta' parameter is rewritten by the verifier, no need for BPF + * program to set it. + * Returns + * A pointer to a percpu object of the type corresponding to the passed in + * 'local_type_id', or NULL on failure. + */ +extern void *bpf_percpu_obj_new_impl(__u64 local_type_id, void *meta) __ksym; + +/* Convenience macro to wrap over bpf_percpu_obj_new_impl */ +#define bpf_percpu_obj_new(type) ((type __percpu *)bpf_percpu_obj_new_impl(bpf_core_type_id_local(type), NULL)) + +/* Description + * Free an allocated percpu object. All fields of the object that require + * destruction will be destructed before the storage is freed. + * + * The 'meta' parameter is rewritten by the verifier, no need for BPF + * program to set it. + * Returns + * Void. + */ +extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym; + +/* Convenience macro to wrap over bpf_obj_drop_impl */ +#define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL) + #endif From patchwork Mon Aug 14 17:28:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353136 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E157C8F1 for ; Mon, 14 Aug 2023 17:29:07 +0000 (UTC) Received: from 66-220-155-179.mail-mxout.facebook.com (66-220-155-179.mail-mxout.facebook.com [66.220.155.179]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B182E133 for ; Mon, 14 Aug 2023 10:29:05 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id DCEA524C22036; Mon, 14 Aug 2023 10:28:51 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr Date: Mon, 14 Aug 2023 10:28:51 -0700 Message-Id: <20230814172851.1365752-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Add non-sleepable and sleepable tests with percpu kptr. For non-sleepable test, four programs are executed in the order of: 1. allocate percpu data. 2. assign values to percpu data. 3. retrieve percpu data. 4. de-allocate percpu data. The sleepable prog tried to exercise all above 4 steps in a single prog. Also for sleepable prog, rcu_read_lock is needed to protect direct percpu ptr access (from map value) and following bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers. Signed-off-by: Yonghong Song --- .../selftests/bpf/prog_tests/percpu_alloc.c | 78 ++++++++ .../selftests/bpf/progs/percpu_alloc_array.c | 187 ++++++++++++++++++ 2 files changed, 265 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/percpu_alloc.c create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_array.c diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c new file mode 100644 index 000000000000..0fb536822f14 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include "percpu_alloc_array.skel.h" + +static void test_array(void) +{ + struct percpu_alloc_array *skel; + int err, prog_fd; + LIBBPF_OPTS(bpf_test_run_opts, topts); + + skel = percpu_alloc_array__open(); + if (!ASSERT_OK_PTR(skel, "percpu_alloc_array__open")) + return; + + bpf_program__set_autoload(skel->progs.test_array_map_1, true); + bpf_program__set_autoload(skel->progs.test_array_map_2, true); + bpf_program__set_autoload(skel->progs.test_array_map_3, true); + bpf_program__set_autoload(skel->progs.test_array_map_4, true); + + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); + + err = percpu_alloc_array__load(skel); + if (!ASSERT_OK(err, "percpu_alloc_array__load")) + goto out; + + err = percpu_alloc_array__attach(skel); + if (!ASSERT_OK(err, "percpu_alloc_array__attach")) + goto out; + + prog_fd = bpf_program__fd(skel->progs.test_array_map_1); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run array_map 1-4"); + ASSERT_EQ(topts.retval, 0, "test_run array_map 1-4"); + ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d"); + ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c"); +out: + percpu_alloc_array__destroy(skel); +} + +static void test_array_sleepable(void) +{ + struct percpu_alloc_array *skel; + int err, prog_fd; + LIBBPF_OPTS(bpf_test_run_opts, topts); + + skel = percpu_alloc_array__open(); + if (!ASSERT_OK_PTR(skel, "percpu_alloc__open")) + return; + + bpf_program__set_autoload(skel->progs.test_array_map_10, true); + + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); + + err = percpu_alloc_array__load(skel); + if (!ASSERT_OK(err, "percpu_alloc_array__load")) + goto out; + + err = percpu_alloc_array__attach(skel); + if (!ASSERT_OK(err, "percpu_alloc_array__attach")) + goto out; + + prog_fd = bpf_program__fd(skel->progs.test_array_map_10); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run array_map_10"); + ASSERT_EQ(topts.retval, 0, "test_run array_map_10"); + ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d"); + ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c"); +out: + percpu_alloc_array__destroy(skel); +} + +void test_percpu_alloc(void) +{ + if (test__start_subtest("array")) + test_array(); + if (test__start_subtest("array_sleepable")) + test_array_sleepable(); +} diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c new file mode 100644 index 000000000000..24903ea565a0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c @@ -0,0 +1,187 @@ +#include "bpf_experimental.h" + +struct val_t { + long b, c, d; +}; + +struct elem { + long sum; + struct val_t __percpu *pc; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct elem); +} array SEC(".maps"); + +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +const volatile int nr_cpus; + +/* Initialize the percpu object */ +SEC("?fentry/bpf_fentry_test1") +int BPF_PROG(test_array_map_1) +{ + struct val_t __percpu *p; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p = bpf_percpu_obj_new(struct val_t); + if (!p) + return 0; + + p = bpf_kptr_xchg(&e->pc, p); + if (p) + bpf_percpu_obj_drop(p); + + return 0; +} + +/* Update percpu data */ +SEC("?fentry/bpf_fentry_test2") +int BPF_PROG(test_array_map_2) +{ + struct val_t __percpu *p; + struct val_t *v; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + v = bpf_per_cpu_ptr(p, 0); + if (!v) + return 0; + v->c = 1; + v->d = 2; + + return 0; +} + +int cpu0_field_d, sum_field_c; + +/* Summarize percpu data */ +SEC("?fentry/bpf_fentry_test3") +int BPF_PROG(test_array_map_3) +{ + struct val_t __percpu *p; + int i, index = 0; + struct val_t *v; + struct elem *e; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + bpf_for(i, 0, nr_cpus) { + v = bpf_per_cpu_ptr(p, i); + if (v) { + if (i == 0) + cpu0_field_d = v->d; + sum_field_c += v->c; + } + } + + return 0; +} + +/* Explicitly free allocated percpu data */ +SEC("?fentry/bpf_fentry_test4") +int BPF_PROG(test_array_map_4) +{ + struct val_t __percpu *p; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + /* delete */ + p = bpf_kptr_xchg(&e->pc, NULL); + if (p) { + bpf_percpu_obj_drop(p); + } + + return 0; +} + +SEC("?fentry.s/bpf_fentry_test1") +int BPF_PROG(test_array_map_10) +{ + struct val_t __percpu *p, *p1; + int i, index = 0; + struct val_t *v; + struct elem *e; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + bpf_rcu_read_lock(); + p = e->pc; + if (!p) { + p = bpf_percpu_obj_new(struct val_t); + if (!p) + goto out; + + p1 = bpf_kptr_xchg(&e->pc, p); + if (p1) { + /* race condition */ + bpf_percpu_obj_drop(p1); + } + + p = e->pc; + if (!p) + goto out; + } + + v = bpf_this_cpu_ptr(p); + v->c = 3; + v = bpf_this_cpu_ptr(p); + v->c = 0; + + v = bpf_per_cpu_ptr(p, 0); + if (!v) + goto out; + v->c = 1; + v->d = 2; + + /* delete */ + p1 = bpf_kptr_xchg(&e->pc, NULL); + if (!p1) + goto out; + + bpf_for(i, 0, nr_cpus) { + v = bpf_per_cpu_ptr(p, i); + if (v) { + if (i == 0) + cpu0_field_d = v->d; + sum_field_c += v->c; + } + } + + /* finally release p */ + bpf_percpu_obj_drop(p1); +out: + bpf_rcu_read_unlock(); + return 0; +} + +char _license[] SEC("license") = "GPL"; From patchwork Mon Aug 14 17:28:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353138 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52653CA40 for ; Mon, 14 Aug 2023 17:29:11 +0000 (UTC) Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB12A10DB for ; Mon, 14 Aug 2023 10:29:08 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 0FB8E24C22056; Mon, 14 Aug 2023 10:28:57 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Date: Mon, 14 Aug 2023 10:28:57 -0700 Message-Id: <20230814172857.1366162-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net In previous selftests/bpf patch, we have p = bpf_percpu_obj_new(struct val_t); if (!p) goto out; p1 = bpf_kptr_xchg(&e->pc, p); if (p1) { /* race condition */ bpf_percpu_obj_drop(p1); } p = e->pc; if (!p) goto out; After bpf_kptr_xchg(), we need to re-read e->pc into 'p'. This is due to that the second argument of bpf_kptr_xchg() is marked OBJ_RELEASE and it will be marked as invalid after the call. So after bpf_kptr_xchg(), 'p' is an unknown scalar, and the bpf program needs to reread from the map value. This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU, and if 'p' is RCU protected. If this is the case, 'p' can be marked as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not an owning reference any more. Such a change makes re-read from the map value unnecessary. Note that re-reading 'e->pc' after bpf_kptr_xchg() might get a different value from 'p' if immediately before 'p = e->pc', another cpu may do another bpf_kptr_xchg() and swap in another value into 'e->pc'. If this is the case, then 'p = e->pc' may get either 'p' or another value, and race condition already exists. So removing direct re-reading seems fine too. Signed-off-by: Yonghong Song --- kernel/bpf/verifier.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6fc200cb68b6..6fa458e13bfc 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env, return err; bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ - if (reg->ref_obj_id == ref_obj_id) - mark_reg_invalid(env, reg); + if (reg->ref_obj_id == ref_obj_id) { + if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) { + reg->ref_obj_id = 0; + reg->type &= ~MEM_ALLOC; + reg->type |= MEM_RCU; + } else { + mark_reg_invalid(env, reg); + } + } })); return 0; From patchwork Mon Aug 14 17:29:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353137 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52616C8F1 for ; Mon, 14 Aug 2023 17:29:11 +0000 (UTC) Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C076710DD for ; Mon, 14 Aug 2023 10:29:09 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 3783024C22066; Mon, 14 Aug 2023 10:29:02 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr Date: Mon, 14 Aug 2023 10:29:02 -0700 Message-Id: <20230814172902.1366613-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net For the second argument of bpf_kptr_xchg(), if the reg type contains MEM_ALLOC and MEM_PERCPU, which means a percpu allocation, after bpf_kptr_xchg(), the argument is marked as MEM_RCU and MEM_PERCPU if in rcu critical section. This way, re-reading from the map value is not needed. Remove it from the percpu_alloc_array.c selftest. Without previous kernel change, the test will fail like below: 0: R1=ctx(off=0,imm=0) R10=fp0 ; int BPF_PROG(test_array_map_10, int a) 0: (b4) w1 = 0 ; R1_w=0 ; int i, index = 0; 1: (63) *(u32 *)(r10 -4) = r1 ; R1_w=0 R10=fp0 fp-8=0000???? 2: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 ; 3: (07) r2 += -4 ; R2_w=fp-4 ; e = bpf_map_lookup_elem(&array, &index); 4: (18) r1 = 0xffff88810e771800 ; R1_w=map_ptr(off=0,ks=4,vs=16,imm=0) 6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) 7: (bf) r6 = r0 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) ; if (!e) 8: (15) if r6 == 0x0 goto pc+81 ; R6_w=map_value(off=0,ks=4,vs=16,imm=0) ; bpf_rcu_read_lock(); 9: (85) call bpf_rcu_read_lock#87892 ; ; p = e->pc; 10: (bf) r7 = r6 ; R6=map_value(off=0,ks=4,vs=16,imm=0) R7_w=map_value(off=0,ks=4,vs=16,imm=0) 11: (07) r7 += 8 ; R7_w=map_value(off=8,ks=4,vs=16,imm=0) 12: (79) r6 = *(u64 *)(r6 +8) ; R6_w=percpu_rcu_ptr_or_null_val_t(id=2,off=0,imm=0) ; if (!p) { 13: (55) if r6 != 0x0 goto pc+13 ; R6_w=0 ; p = bpf_percpu_obj_new(struct val_t); 14: (18) r1 = 0x12 ; R1_w=18 16: (b7) r2 = 0 ; R2_w=0 17: (85) call bpf_percpu_obj_new_impl#87883 ; R0_w=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4 18: (bf) r6 = r0 ; R0=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4 ; if (!p) 19: (15) if r6 == 0x0 goto pc+69 ; R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4 ; p1 = bpf_kptr_xchg(&e->pc, p); 20: (bf) r1 = r7 ; R1_w=map_value(off=8,ks=4,vs=16,imm=0) R7=map_value(off=8,ks=4,vs=16,imm=0) refs=4 21: (bf) r2 = r6 ; R2_w=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4 22: (85) call bpf_kptr_xchg#194 ; R0_w=percpu_ptr_or_null_val_t(id=6,ref_obj_id=6,off=0,imm=0) refs=6 ; if (p1) { 23: (15) if r0 == 0x0 goto pc+3 ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6 ; bpf_percpu_obj_drop(p1); 24: (bf) r1 = r0 ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) R1_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6 25: (b7) r2 = 0 ; R2_w=0 refs=6 26: (85) call bpf_percpu_obj_drop_impl#87882 ; ; v = bpf_this_cpu_ptr(p); 27: (bf) r1 = r6 ; R1_w=scalar(id=7) R6=scalar(id=7) 28: (85) call bpf_this_cpu_ptr#154 R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_ The R1 which gets its value from R6 is a scalar. But before insn 22, R6 is R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) Its type is changed to a scalar at insn 22 without previous patch. Signed-off-by: Yonghong Song --- tools/testing/selftests/bpf/progs/percpu_alloc_array.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c index 24903ea565a0..cde23eb7468e 100644 --- a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c +++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c @@ -146,10 +146,6 @@ int BPF_PROG(test_array_map_10) /* race condition */ bpf_percpu_obj_drop(p1); } - - p = e->pc; - if (!p) - goto out; } v = bpf_this_cpu_ptr(p); From patchwork Mon Aug 14 17:29:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353155 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D7FAC2FF for ; Mon, 14 Aug 2023 17:32:05 +0000 (UTC) Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C030133 for ; Mon, 14 Aug 2023 10:32:03 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id B92CE24C22097; Mon, 14 Aug 2023 10:29:07 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr Date: Mon, 14 Aug 2023 10:29:07 -0700 Message-Id: <20230814172907.1367077-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Add a non-sleepable cgrp_local_storage test with percpu kptr. The test does allocation of percpu data, assigning values to percpu data and retrieval of percpu data. The de-allocation of percpu data is done when the map is freed. Signed-off-by: Yonghong Song --- .../selftests/bpf/prog_tests/percpu_alloc.c | 40 +++++++ .../progs/percpu_alloc_cgrp_local_storage.c | 105 ++++++++++++++++++ 2 files changed, 145 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c index 0fb536822f14..41bf784a4bb3 100644 --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include "percpu_alloc_array.skel.h" +#include "percpu_alloc_cgrp_local_storage.skel.h" static void test_array(void) { @@ -69,10 +70,49 @@ static void test_array_sleepable(void) percpu_alloc_array__destroy(skel); } +static void test_cgrp_local_storage(void) +{ + struct percpu_alloc_cgrp_local_storage *skel; + int err, cgroup_fd, prog_fd; + LIBBPF_OPTS(bpf_test_run_opts, topts); + + cgroup_fd = test__join_cgroup("/percpu_alloc"); + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /percpu_alloc")) + return; + + skel = percpu_alloc_cgrp_local_storage__open(); + if (!ASSERT_OK_PTR(skel, "percpu_alloc_cgrp_local_storage__open")) + goto close_fd; + + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); + + err = percpu_alloc_cgrp_local_storage__load(skel); + if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__load")) + goto destroy_skel; + + err = percpu_alloc_cgrp_local_storage__attach(skel); + if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__attach")) + goto destroy_skel; + + prog_fd = bpf_program__fd(skel->progs.test_cgrp_local_storage_1); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run cgrp_local_storage 1-3"); + ASSERT_EQ(topts.retval, 0, "test_run cgrp_local_storage 1-3"); + ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d"); + ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c"); + +destroy_skel: + percpu_alloc_cgrp_local_storage__destroy(skel); +close_fd: + close(cgroup_fd); +} + void test_percpu_alloc(void) { if (test__start_subtest("array")) test_array(); if (test__start_subtest("array_sleepable")) test_array_sleepable(); + if (test__start_subtest("cgrp_local_storage")) + test_cgrp_local_storage(); } diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c new file mode 100644 index 000000000000..b79285ac770e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c @@ -0,0 +1,105 @@ +#include "bpf_experimental.h" + +struct val_t { + long b, c, d; +}; + +struct elem { + long sum; + struct val_t __percpu *pc; +}; + +struct { + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct elem); +} cgrp SEC(".maps"); + +const volatile int nr_cpus; + +/* Initialize the percpu object */ +SEC("fentry/bpf_fentry_test1") +int BPF_PROG(test_cgrp_local_storage_1) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct elem *e; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!e) + return 0; + + p = bpf_percpu_obj_new(struct val_t); + if (!p) + return 0; + + p = bpf_kptr_xchg(&e->pc, p); + if (p) + bpf_percpu_obj_drop(p); + + return 0; +} + +/* Percpu data collection */ +SEC("fentry/bpf_fentry_test2") +int BPF_PROG(test_cgrp_local_storage_2) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct val_t *v; + struct elem *e; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + v = bpf_per_cpu_ptr(p, 0); + if (!v) + return 0; + v->c = 1; + v->d = 2; + return 0; +} + +int cpu0_field_d, sum_field_c; + +/* Summarize percpu data collection */ +SEC("fentry/bpf_fentry_test3") +int BPF_PROG(test_cgrp_local_storage_3) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct val_t *v; + struct elem *e; + int i; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + bpf_for(i, 0, nr_cpus) { + v = bpf_per_cpu_ptr(p, i); + if (v) { + if (i == 0) + cpu0_field_d = v->d; + sum_field_c += v->c; + } + } + + return 0; +} + +char _license[] SEC("license") = "GPL"; From patchwork Mon Aug 14 17:29:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353140 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 473DAC2FF for ; Mon, 14 Aug 2023 17:29:28 +0000 (UTC) Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0873810D0 for ; Mon, 14 Aug 2023 10:29:26 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 2CCBA24C220BF; Mon, 14 Aug 2023 10:29:12 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure Date: Mon, 14 Aug 2023 10:29:12 -0700 Message-Id: <20230814172912.1367692-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Currently, bpf_percpu_obj_new() only allows scalar struct. A scalar struct contains scalar member or a struct with scalar member up to 3 nested struct levels. For example, below is a linked list in the per cpu object: struct val_t { long b, c, d; struct bpf_list_head head __contains(foo, node); struct bpf_spin_lock lock; }; struct map_val_t { ... struct val_t __percpu *pc; ... }; This patch added verifier support for the above percpu struct with bpf_spin_lock and bpf_list_head. The change is mostly to add MEM_RCU to various type checking places since the eventual per cpu object is marked as MEM_RCU, which is used as the input to various helpers/kfuncs like bpf_spin_lock(), bpf_spin_unlock() and bpf_list_push_back(), etc. Another possible use case is bpf_rb_root where a rb tree is in a percpu struct. The support can be added similarly in the future if needed. Signed-off-by: Yonghong Song --- kernel/bpf/verifier.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6fa458e13bfc..4c3045c8d25f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7848,6 +7848,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, type &= ~MEM_ALLOC; type &= ~MEM_PERCPU; } + if (meta->func_id == BPF_FUNC_spin_lock || meta->func_id == BPF_FUNC_spin_unlock) + type &= ~MEM_RCU; for (i = 0; i < ARRAY_SIZE(compatible->types); i++) { expected = compatible->types[i]; @@ -7929,6 +7931,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, } break; } + case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU: case PTR_TO_BTF_ID | MEM_ALLOC: if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock && meta->func_id != BPF_FUNC_kptr_xchg) { @@ -8040,6 +8043,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, case PTR_TO_BTF_ID | MEM_ALLOC: case PTR_TO_BTF_ID | PTR_TRUSTED: case PTR_TO_BTF_ID | MEM_RCU: + case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU: case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF: /* When referenced PTR_TO_BTF_ID is passed to release function, * its fixed offset must be 0. In the other cases, fixed offset @@ -10604,8 +10608,8 @@ static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_o * active_lock.ptr = Register's type specific pointer * active_lock.id = A unique ID for each register pointer value * - * Currently, PTR_TO_MAP_VALUE and PTR_TO_BTF_ID | MEM_ALLOC are the two - * supported register types. + * Currently, PTR_TO_MAP_VALUE, PTR_TO_BTF_ID | MEM_ALLOC and + * PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU are the three supported register types. * * The active_lock.ptr in case of map values is the reg->map_ptr, and in case of * allocated objects is the reg->btf pointer. @@ -10640,6 +10644,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ ptr = reg->map_ptr; break; case PTR_TO_BTF_ID | MEM_ALLOC: + case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU: ptr = reg->btf; break; default: @@ -11140,7 +11145,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ break; case KF_ARG_PTR_TO_LIST_HEAD: if (reg->type != PTR_TO_MAP_VALUE && - reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) && + reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU)) { verbose(env, "arg#%d expected pointer to map value or allocated object\n", i); return -EINVAL; } From patchwork Mon Aug 14 17:29:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353139 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58F8FC2FF for ; Mon, 14 Aug 2023 17:29:24 +0000 (UTC) Received: from 69-171-232-180.mail-mxout.facebook.com (69-171-232-180.mail-mxout.facebook.com [69.171.232.180]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D31E133 for ; Mon, 14 Aug 2023 10:29:22 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 55E8824C220F5; Mon, 14 Aug 2023 10:29:18 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head Date: Mon, 14 Aug 2023 10:29:18 -0700 Message-Id: <20230814172918.1369380-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net This is to test nested special fields with the following pattern: struct val_t { long b, c, d; struct bpf_list_head head __contains(foo, node); struct bpf_spin_lock lock; }; struct map_val_t { ... struct val_t __percpu *pc; ... }; That is, the percpu data struct can hold a linked list. Signed-off-by: Yonghong Song --- .../selftests/bpf/prog_tests/percpu_alloc.c | 40 ++++++ .../percpu_alloc_nested_special_fields.c | 121 ++++++++++++++++++ 2 files changed, 161 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c index 41bf784a4bb3..ee9dd495db7b 100644 --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c @@ -2,6 +2,7 @@ #include #include "percpu_alloc_array.skel.h" #include "percpu_alloc_cgrp_local_storage.skel.h" +#include "percpu_alloc_nested_special_fields.skel.h" static void test_array(void) { @@ -107,6 +108,43 @@ static void test_cgrp_local_storage(void) close(cgroup_fd); } +static void test_nested_special_fields(void) +{ + struct percpu_alloc_nested_special_fields *skel; + int err, cgroup_fd, prog_fd; + LIBBPF_OPTS(bpf_test_run_opts, topts); + + cgroup_fd = test__join_cgroup("/percpu_alloc"); + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /percpu_alloc")) + return; + + skel = percpu_alloc_nested_special_fields__open(); + if (!ASSERT_OK_PTR(skel, "percpu_alloc_nested_special_fields__open")) + goto close_fd; + + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); + + err = percpu_alloc_nested_special_fields__load(skel); + if (!ASSERT_OK(err, "percpu_alloc_nested_special_fields__load")) + goto destroy_skel; + + err = percpu_alloc_nested_special_fields__attach(skel); + if (!ASSERT_OK(err, "percpu_alloc_nested_special_fields__attach")) + goto destroy_skel; + + prog_fd = bpf_program__fd(skel->progs.test_cgrp_local_storage_1); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run nested_special_fields 1-3"); + ASSERT_EQ(topts.retval, 0, "test_run nested_special_fields 1-3"); + ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d"); + ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c"); + +destroy_skel: + percpu_alloc_nested_special_fields__destroy(skel); +close_fd: + close(cgroup_fd); +} + void test_percpu_alloc(void) { if (test__start_subtest("array")) @@ -115,4 +153,6 @@ void test_percpu_alloc(void) test_array_sleepable(); if (test__start_subtest("cgrp_local_storage")) test_cgrp_local_storage(); + if (test__start_subtest("nested_special_fields")) + test_nested_special_fields(); } diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c b/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c new file mode 100644 index 000000000000..d1a8e9b6b472 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c @@ -0,0 +1,121 @@ +#include "bpf_experimental.h" + +struct foo { + long key, data; + struct bpf_list_node node; +}; + +struct val_t { + long b, c, d; + struct bpf_list_head head __contains(foo, node); + struct bpf_spin_lock lock; +}; + +struct elem { + long sum; + struct val_t __percpu *pc; +}; + +struct { + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct elem); +} cgrp SEC(".maps"); + +const volatile int nr_cpus; + +/* Initialize the percpu object */ +SEC("fentry/bpf_fentry_test1") +int BPF_PROG(test_cgrp_local_storage_1) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct elem *e; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!e) + return 0; + + p = bpf_percpu_obj_new(struct val_t); + if (!p) + return 0; + + p = bpf_kptr_xchg(&e->pc, p); + if (p) + bpf_percpu_obj_drop(p); + + return 0; +} + +/* Percpu data collection */ +SEC("fentry/bpf_fentry_test2") +int BPF_PROG(test_cgrp_local_storage_2) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct val_t *v; + struct elem *e; + struct foo *f; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + v = bpf_per_cpu_ptr(p, 0); + if (!v) + return 0; + v->c = 1; + v->d = 2; + + f = bpf_obj_new(struct foo); + if (!f) + return 0; + bpf_spin_lock(&v->lock); + bpf_list_push_back(&v->head, &f->node); + bpf_spin_unlock(&v->lock); + + return 0; +} + +int cpu0_field_d, sum_field_c; + +/* Summarize percpu data collection */ +SEC("fentry/bpf_fentry_test3") +int BPF_PROG(test_cgrp_local_storage_3) +{ + struct task_struct *task; + struct val_t __percpu *p; + struct val_t *v; + struct elem *e; + int i; + + task = bpf_get_current_task_btf(); + e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0); + if (!e) + return 0; + + p = e->pc; + if (!p) + return 0; + + bpf_for(i, 0, nr_cpus) { + v = bpf_per_cpu_ptr(p, i); + if (v) { + if (i == 0) + cpu0_field_d = v->d; + sum_field_c += v->c; + } + } + + return 0; +} + +char _license[] SEC("license") = "GPL"; From patchwork Mon Aug 14 17:29:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353141 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A7E2C2FF for ; Mon, 14 Aug 2023 17:29:40 +0000 (UTC) Received: from 66-220-155-179.mail-mxout.facebook.com (66-220-155-179.mail-mxout.facebook.com [66.220.155.179]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33D8B1723 for ; Mon, 14 Aug 2023 10:29:35 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id 81EC324C2212C; Mon, 14 Aug 2023 10:29:23 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests Date: Mon, 14 Aug 2023 10:29:23 -0700 Message-Id: <20230814172923.1371368-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Add a few negative tests for common mistakes with using percpu kptr. Other similar error cases occuring with bpf_obj_new/drop() are not included here. Signed-off-by: Yonghong Song --- .../selftests/bpf/prog_tests/percpu_alloc.c | 7 ++ .../selftests/bpf/progs/percpu_alloc_fail.c | 100 ++++++++++++++++++ 2 files changed, 107 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c index ee9dd495db7b..17ba3159cea7 100644 --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c @@ -3,6 +3,7 @@ #include "percpu_alloc_array.skel.h" #include "percpu_alloc_cgrp_local_storage.skel.h" #include "percpu_alloc_nested_special_fields.skel.h" +#include "percpu_alloc_fail.skel.h" static void test_array(void) { @@ -145,6 +146,10 @@ static void test_nested_special_fields(void) close(cgroup_fd); } +static void test_failure(void) { + RUN_TESTS(percpu_alloc_fail); +} + void test_percpu_alloc(void) { if (test__start_subtest("array")) @@ -155,4 +160,6 @@ void test_percpu_alloc(void) test_cgrp_local_storage(); if (test__start_subtest("nested_special_fields")) test_nested_special_fields(); + if (test__start_subtest("failure_tests")) + test_failure(); } diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c new file mode 100644 index 000000000000..47bdf6f857cb --- /dev/null +++ b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c @@ -0,0 +1,100 @@ +#include "bpf_experimental.h" +#include "bpf_misc.h" + +struct val_t { + long b, c, d; +}; + +struct val2_t { + long b; +}; + +struct elem { + long sum; + struct val_t __percpu *pc; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct elem); +} array SEC(".maps"); + +long ret; + +SEC("?fentry/bpf_fentry_test1") +__failure __msg("store to referenced kptr disallowed") +int BPF_PROG(test_array_map_1) +{ + struct val_t __percpu *p; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p = bpf_percpu_obj_new(struct val_t); + if (!p) + return 0; + + p = bpf_kptr_xchg(&e->pc, p); + if (p) + bpf_percpu_obj_drop(p); + + e->pc = (struct val_t __percpu *)ret; + return 0; +} + +SEC("?fentry/bpf_fentry_test1") +__failure __msg("invalid kptr access, R2 type=percpu_ptr_val2_t expected=ptr_val_t") +int BPF_PROG(test_array_map_2) +{ + struct val2_t __percpu *p2; + struct val_t __percpu *p; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p2 = bpf_percpu_obj_new(struct val2_t); + if (!p2) + return 0; + + p = bpf_kptr_xchg(&e->pc, p2); + if (p) + bpf_percpu_obj_drop(p); + + return 0; +} + +SEC("?fentry.s/bpf_fentry_test1") +__failure __msg("R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_") +int BPF_PROG(test_array_map_3) +{ + struct val_t __percpu *p, *p1; + struct val_t *v; + struct elem *e; + int index = 0; + + e = bpf_map_lookup_elem(&array, &index); + if (!e) + return 0; + + p = bpf_percpu_obj_new(struct val_t); + if (!p) + return 0; + + p1 = bpf_kptr_xchg(&e->pc, p); + if (p1) + bpf_percpu_obj_drop(p1); + + v = bpf_this_cpu_ptr(p); + ret = v->b; + return 0; +} + +char _license[] SEC("license") = "GPL"; From patchwork Mon Aug 14 17:29:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 13353142 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A29CAC2FF for ; Mon, 14 Aug 2023 17:29:40 +0000 (UTC) Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A7C81726 for ; Mon, 14 Aug 2023 10:29:36 -0700 (PDT) Received: by devbig309.ftw3.facebook.com (Postfix, from userid 128203) id ACF0F24C22195; Mon, 14 Aug 2023 10:29:28 -0700 (PDT) From: Yonghong Song To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau Subject: [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Date: Mon, 14 Aug 2023 10:29:28 -0700 Message-Id: <20230814172928.1373311-1-yonghong.song@linux.dev> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814172809.1361446-1-yonghong.song@linux.dev> References: <20230814172809.1361446-1-yonghong.song@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_SOFTFAIL, TVD_RCVD_IP,UPPERCASE_50_75 autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr' can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated. Signed-off-by: Yonghong Song --- include/uapi/linux/bpf.h | 9 ++++++++- tools/include/uapi/linux/bpf.h | 9 ++++++++- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d21deb46f49f..5d1bb6b42ea2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -932,7 +932,14 @@ enum bpf_map_type { */ BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED, + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE + + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + * deprecated. + */ + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED, BPF_MAP_TYPE_QUEUE, BPF_MAP_TYPE_STACK, BPF_MAP_TYPE_SK_STORAGE, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d21deb46f49f..5d1bb6b42ea2 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -932,7 +932,14 @@ enum bpf_map_type { */ BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED, + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE + + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + * deprecated. + */ + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED, BPF_MAP_TYPE_QUEUE, BPF_MAP_TYPE_STACK, BPF_MAP_TYPE_SK_STORAGE,