From patchwork Thu Mar 27 08:34:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030837 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BE59204F72 for ; Thu, 27 Mar 2025 08:22:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063778; cv=none; b=my3R7yTs6yvWgh9a070e7az9QFBMP3jRpTspDYN44E0ttW4TfWJnVrt2P2f45aKsQyZADan7yF6y7S2o0KL3SBndMXFIYd2mFnzQoYtvQ6nhsUR2Kg1Wb83U47n+daL3oFlrxn7rOupGcq7cdwgA0L0iStMZBxq54VVUqHMaGPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063778; c=relaxed/simple; bh=6s1072Of2TkmZoucRpkDlaD00oTMM5x2AZTzoApXicU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CjY19P6LyKMvJa2XH2zW5n4i0Zqmos1ix/OjfK3hGg+Qdb2O0wJUMHl9gTHjtlcmEsWRPzMKH0lBz9dPfJieBchwjVSwUpsIEWkAq2OTrpLsY02Yv+FD63x4t0cjX8z7fPflo1WgYTri3l51+3L0uz80CiMjhz6uEQ8ue0WgLUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8g2dxgz4f3m7L for ; Thu, 27 Mar 2025 16:22:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id C8C461A1511 for ; Thu, 27 Mar 2025 16:22:51 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S5; Thu, 27 Mar 2025 16:22:51 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 01/16] bpf: Introduce BPF_DYNPTR and helpers to facilitate its parsing Date: Thu, 27 Mar 2025 16:34:40 +0800 Message-Id: <20250327083455.848708-2-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S5 X-Coremail-Antispam: 1UD129KBjvJXoWxCw4fJw1DCrWDZF43CrykKrg_yoWrur13pF nrAr13Cr48trW3uw1DGrs8u3y3tay8Gw12vFy7K34akFZ2qryDXFs8Kr18ZryYkrZ0kr1x Zr1YgFZ8Ary7AFUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPFb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU2HGQ DUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add BPF_DYNPTR in btf_field_type to support bpf_dynptr in map key. The parsing of bpf_dynptr in btf will be done in the following patch, and the patch only adds two helpers: btf_new_bpf_dynptr_record() creates an btf record which only includes a bpf_dynptr and btf_type_is_dynptr() checks whether the btf_type is a bpf_dynptr or not. Signed-off-by: Hou Tao --- include/linux/bpf.h | 1 + include/linux/btf.h | 2 ++ kernel/bpf/btf.c | 44 ++++++++++++++++++++++++++++++++++++++------ 3 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2083905a4e9fa..0b65c98d8b7d5 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -206,6 +206,7 @@ enum btf_field_type { BPF_WORKQUEUE = (1 << 10), BPF_UPTR = (1 << 11), BPF_RES_SPIN_LOCK = (1 << 12), + BPF_DYNPTR = (1 << 13), }; typedef void (*btf_dtor_kfunc_t)(void *); diff --git a/include/linux/btf.h b/include/linux/btf.h index ebc0c0c9b9446..2ab48b377d312 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -226,8 +226,10 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, u32 expected_offset, u32 expected_size); struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size); +struct btf_record *btf_new_bpf_dynptr_record(void); int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); bool btf_type_is_void(const struct btf_type *t); +bool btf_type_is_dynptr(const struct btf *btf, const struct btf_type *t); s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p); const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 16ba36f34dfab..1054a1e27e9d3 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3939,6 +3939,17 @@ static int btf_field_cmp(const void *_a, const void *_b, const void *priv) return 0; } +static void btf_init_record(struct btf_record *record) +{ + record->cnt = 0; + record->field_mask = 0; + record->spin_lock_off = -EINVAL; + record->res_spin_lock_off = -EINVAL; + record->timer_off = -EINVAL; + record->wq_off = -EINVAL; + record->refcount_off = -EINVAL; +} + struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size) { @@ -3957,15 +3968,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type /* This needs to be kzalloc to zero out padding and unused fields, see * comment in btf_record_equal. */ - rec = kzalloc(offsetof(struct btf_record, fields[cnt]), GFP_KERNEL | __GFP_NOWARN); + rec = kzalloc(struct_size(rec, fields, cnt), GFP_KERNEL | __GFP_NOWARN); if (!rec) return ERR_PTR(-ENOMEM); - rec->spin_lock_off = -EINVAL; - rec->res_spin_lock_off = -EINVAL; - rec->timer_off = -EINVAL; - rec->wq_off = -EINVAL; - rec->refcount_off = -EINVAL; + btf_init_record(rec); for (i = 0; i < cnt; i++) { field_type_size = btf_field_type_size(info_arr[i].type); if (info_arr[i].off + field_type_size > value_size) { @@ -4067,6 +4074,25 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(ret); } +struct btf_record *btf_new_bpf_dynptr_record(void) +{ + struct btf_record *record; + + record = kzalloc(struct_size(record, fields, 1), GFP_KERNEL | __GFP_NOWARN); + if (!record) + return ERR_PTR(-ENOMEM); + + btf_init_record(record); + + record->cnt = 1; + record->field_mask = BPF_DYNPTR; + record->fields[0].offset = 0; + record->fields[0].size = sizeof(struct bpf_dynptr); + record->fields[0].type = BPF_DYNPTR; + + return record; +} + int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec) { int i; @@ -7562,6 +7588,12 @@ static bool btf_is_dynptr_ptr(const struct btf *btf, const struct btf_type *t) return false; } +bool btf_type_is_dynptr(const struct btf *btf, const struct btf_type *t) +{ + return __btf_type_is_struct(t) && t->size == sizeof(struct bpf_dynptr) && + !strcmp(__btf_name_by_offset(btf, t->name_off), "bpf_dynptr"); +} + struct bpf_cand_cache { const char *name; u32 name_len; From patchwork Thu Mar 27 08:34:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030847 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 355292054FE for ; Thu, 27 Mar 2025 08:23:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; cv=none; b=aVXAJNHFOFuRH0S5CLYvI8fNIBM1gBmH256TSkSGwiNP0l7eI9lMCAdZOqR7GptzL7kuT1LDOrvshRDFqF2FBzWpWqHcewiMULsf1VU80pQjLVTn61PViZ7Z6H/CXxicWMK2dJ+j7o62+cPgHr4mVRzy7FBsE62WuYEir467yDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; c=relaxed/simple; bh=vIUgmKhqEEy8Ky9B6Ja7qVovOblEFPnslBjvgQBIWDE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VL3gEzC7IbrjgCT2T53862ascno8kza/J8vZt6z5OungMqiQ7i1tcPtBZJGR6sVhzor4Z1AhTU6eqTlaJimjnkAQYor6nbw/Pe5e0hswMXSsh022pncFkczg/Bgv8Pd+j4F+MdP2gclSHlZJ0QRV8mD2kHcwwOB5q/Jdr30RDHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8j31XRz4f3jXm for ; Thu, 27 Mar 2025 16:22:29 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 5F08A1A117C for ; Thu, 27 Mar 2025 16:22:52 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S6; Thu, 27 Mar 2025 16:22:52 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 02/16] bpf: Parse bpf_dynptr in map key Date: Thu, 27 Mar 2025 16:34:41 +0800 Message-Id: <20250327083455.848708-3-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S6 X-Coremail-Antispam: 1UD129KBjvJXoW3ury3CFWftFWxtF4DWF1fCrg_yoWDZw1xpF 4xCryfCr4ktr43WrnxGay3ury3tw4kWw17WF95K34akF4SgryDZF18tFyrur45KFs8Krn7 Ar429F95A347AFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPFb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFSdy UUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao To support variable-length key or strings in map key, use bpf_dynptr to represent these variable-length objects and save these bpf_dynptr fields in the map key. As shown in the examples below, a map key with an integer and a string is defined: struct pid_name { int pid; struct bpf_dynptr name; }; The bpf_dynptr in the map key could also be contained indirectly in a struct as shown below: struct pid_name_time { struct pid_name process; unsigned long long time; }; It is also fine to have multiple bpf_dynptrs in the map key as shown below. The maximum number of bpf_dynptr in a map key is limited to 2 and the limitation can be lifted if necessary. struct pid_name_tag { struct pid_name process; struct bpf_dynptr tag; }; If the whole map key is a bpf_dynptr, the map could be defined as a struct or directly using bpf_dynptr as the map key: struct map_key { struct bpf_dynptr name; }; The bpf program could use bpf_dynptr_init() to initialize the dynptr part in the map key, and the userspace application will use bpf_dynptr_user_init() or similar API to initialize the dynptr. Just like kptrs in map value, the bpf_dynptr field in the map key could also be defined in a nested struct which is contained in the map key struct. The patch updates map_create() accordingly to parse these bpf_dynptr fields in map key, just like it does for other special fields in map value. These special fields are saved in the newly-added key_record field of bpf_map. Considering both key_record and key_size are used during the lookup procedure, place key_record in the same cacheline as key_size and move the cold map_extra to the next cacheline. At present, only BPF_MAP_TYPE_HASH map supports bpf_dynptr and the support will be enabled later when its implementation is ready. Signed-off-by: Hou Tao --- include/linux/bpf.h | 12 +++++++++++- kernel/bpf/btf.c | 4 ++++ kernel/bpf/map_in_map.c | 21 +++++++++++++++++---- kernel/bpf/syscall.c | 40 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 72 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0b65c98d8b7d5..e25ff78f1fabf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -271,10 +271,13 @@ struct bpf_map { u32 key_size; u32 value_size; u32 max_entries; - u64 map_extra; /* any per-map-type extra fields */ u32 map_flags; u32 id; + /* BTF record for special fields in map Key. Only allow bpf_dynptr */ + struct btf_record *key_record; + /* BTF record for special fields in map Value. Disallow bpf_dynptr. */ struct btf_record *record; + u64 map_extra; /* any per-map-type extra fields */ int numa_node; u32 btf_key_type_id; u32 btf_value_type_id; @@ -341,6 +344,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) return "bpf_rb_node"; case BPF_REFCOUNT: return "bpf_refcount"; + case BPF_DYNPTR: + return "bpf_dynptr"; default: WARN_ON_ONCE(1); return "unknown"; @@ -373,6 +378,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) return sizeof(struct bpf_rb_node); case BPF_REFCOUNT: return sizeof(struct bpf_refcount); + case BPF_DYNPTR: + return sizeof(struct bpf_dynptr); default: WARN_ON_ONCE(1); return 0; @@ -405,6 +412,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) return __alignof__(struct bpf_rb_node); case BPF_REFCOUNT: return __alignof__(struct bpf_refcount); + case BPF_DYNPTR: + return __alignof__(struct bpf_dynptr); default: WARN_ON_ONCE(1); return 0; @@ -436,6 +445,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_KPTR_REF: case BPF_KPTR_PERCPU: case BPF_UPTR: + case BPF_DYNPTR: break; default: WARN_ON_ONCE(1); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 1054a1e27e9d3..c3c28ecf6bf09 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3513,6 +3513,7 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_ field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root"); field_mask_test_name(BPF_RB_NODE, "bpf_rb_node"); field_mask_test_name(BPF_REFCOUNT, "bpf_refcount"); + field_mask_test_name(BPF_DYNPTR, "bpf_dynptr"); /* Only return BPF_KPTR when all other types with matchable names fail */ if (field_mask & (BPF_KPTR | BPF_UPTR) && !__btf_type_is_struct(var_type)) { @@ -3551,6 +3552,7 @@ static int btf_repeat_fields(struct btf_field_info *info, int info_cnt, case BPF_UPTR: case BPF_LIST_HEAD: case BPF_RB_ROOT: + case BPF_DYNPTR: break; default: return -EINVAL; @@ -3674,6 +3676,7 @@ static int btf_find_field_one(const struct btf *btf, case BPF_LIST_NODE: case BPF_RB_NODE: case BPF_REFCOUNT: + case BPF_DYNPTR: ret = btf_find_struct(btf, var_type, off, sz, field_type, info_cnt ? &info[0] : &tmp); if (ret < 0) @@ -4037,6 +4040,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type break; case BPF_LIST_NODE: case BPF_RB_NODE: + case BPF_DYNPTR: break; default: ret = -EFAULT; diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index 645bd30bc9a9d..564ebcc857564 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -12,6 +12,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) struct bpf_map *inner_map, *inner_map_meta; u32 inner_map_meta_size; CLASS(fd, f)(inner_map_ufd); + int ret; inner_map = __bpf_map_get(f); if (IS_ERR(inner_map)) @@ -45,10 +46,15 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) * invalid/empty/valid, but ERR_PTR in case of errors. During * equality NULL or IS_ERR is equivalent. */ - struct bpf_map *ret = ERR_CAST(inner_map_meta->record); - kfree(inner_map_meta); - return ret; + ret = PTR_ERR(inner_map_meta->record); + goto free_meta; } + inner_map_meta->key_record = btf_record_dup(inner_map->key_record); + if (IS_ERR(inner_map_meta->key_record)) { + ret = PTR_ERR(inner_map_meta->key_record); + goto free_record; + } + /* Note: We must use the same BTF, as we also used btf_record_dup above * which relies on BTF being same for both maps, as some members like * record->fields.list_head have pointers like value_rec pointing into @@ -71,6 +77,12 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1; } return inner_map_meta; + +free_record: + btf_record_free(inner_map_meta->record); +free_meta: + kfree(inner_map_meta); + return ERR_PTR(ret); } void bpf_map_meta_free(struct bpf_map *map_meta) @@ -88,7 +100,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0, meta0->key_size == meta1->key_size && meta0->value_size == meta1->value_size && meta0->map_flags == meta1->map_flags && - btf_record_equal(meta0->record, meta1->record); + btf_record_equal(meta0->record, meta1->record) && + btf_record_equal(meta0->key_record, meta1->key_record); } void *bpf_map_fd_get_ptr(struct bpf_map *map, diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9794446bc8c6c..9ded3ba82d356 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -669,6 +669,7 @@ void btf_record_free(struct btf_record *rec) case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: + case BPF_DYNPTR: /* Nothing to release */ break; default: @@ -682,7 +683,9 @@ void btf_record_free(struct btf_record *rec) void bpf_map_free_record(struct bpf_map *map) { btf_record_free(map->record); + btf_record_free(map->key_record); map->record = NULL; + map->key_record = NULL; } struct btf_record *btf_record_dup(const struct btf_record *rec) @@ -722,6 +725,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: + case BPF_DYNPTR: /* Nothing to acquire */ break; default: @@ -841,6 +845,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) case BPF_RB_NODE: case BPF_REFCOUNT: break; + case BPF_DYNPTR: + break; default: WARN_ON_ONCE(1); continue; @@ -850,6 +856,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) static void bpf_map_free(struct bpf_map *map) { + struct btf_record *key_rec = map->key_record; struct btf_record *rec = map->record; struct btf *btf = map->btf; @@ -870,6 +877,7 @@ static void bpf_map_free(struct bpf_map *map) * eventually calls bpf_map_free_meta, since inner_map_meta is only a * template bpf_map struct used during verification. */ + btf_record_free(key_rec); btf_record_free(rec); /* Delay freeing of btf for maps, as map_free callback may need * struct_meta info which will be freed with btf_put(). @@ -1209,6 +1217,8 @@ int map_check_no_btf(const struct bpf_map *map, return -ENOTSUPP; } +#define MAX_DYNPTR_CNT_IN_MAP_KEY 2 + static int map_check_btf(struct bpf_map *map, struct bpf_token *token, const struct btf *btf, u32 btf_key_id, u32 btf_value_id) { @@ -1231,6 +1241,36 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, if (!value_type || value_size != map->value_size) return -EINVAL; + /* Key BTF type can't be data section */ + if (btf_type_is_dynptr(btf, key_type)) + map->key_record = btf_new_bpf_dynptr_record(); + else if (__btf_type_is_struct(key_type)) + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size); + else + map->key_record = NULL; + if (!IS_ERR_OR_NULL(map->key_record)) { + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) { + ret = -E2BIG; + goto free_map_tab; + } + if (!bpf_token_capable(token, CAP_BPF)) { + ret = -EPERM; + goto free_map_tab; + } + /* Disallow key with dynptr for special map */ + if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) { + ret = -EACCES; + goto free_map_tab; + } + /* Enable for BPF_MAP_TYPE_HASH later */ + ret = -EOPNOTSUPP; + goto free_map_tab; + } else if (IS_ERR(map->key_record)) { + /* Return an error early even the bpf program doesn't use it */ + ret = PTR_ERR(map->key_record); + goto free_map_tab; + } + map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR, From patchwork Thu Mar 27 08:34:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030838 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62FB4204F75 for ; Thu, 27 Mar 2025 08:22:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; cv=none; b=BITdouR4ZQQ0JC2kLx5gcB7NkUs2BweiiS7t8UjfQksMHTSHOVeW+tNI0L1GOphchEUXO5ZWBQcBddDAcFLljH62Hulhjiv5KXlx4YqpyOjreW27wXOQyamcnCZxySzXqRwg5r+UPURfSQ3gUKZ/MVP7bCy4oCZt3Rbc1Ya28ns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; c=relaxed/simple; bh=ZgPbk1e+Tp2MrrcUgfu90AqAohp4UtLMSd85UTZOyJA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Fi1Fen21aVDhJHjXqGl9CEuOAvNa66/QqWm4+t1V4CwfBG+j59PUrC5tzM3XJ75GmfghAHJ35xQ53a6Q8iukiNZQMGEDPxv2cotgP1kFTSC5xriuj8EvJ65R48B0qRAejIDBVcm7ELz9JfjjoczVkFKyz/NuOguWv7NkUWg4vG0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8h31bmz4f3m76 for ; Thu, 27 Mar 2025 16:22:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id DD8AF1A0CD6 for ; Thu, 27 Mar 2025 16:22:52 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S7; Thu, 27 Mar 2025 16:22:52 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 03/16] bpf: Add helper bpf_map_has_dynptr_key() Date: Thu, 27 Mar 2025 16:34:42 +0800 Message-Id: <20250327083455.848708-4-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S7 X-Coremail-Antispam: 1UD129KBjvdXoW7XF1xJF1UAr4DGF18JFy3XFb_yoW3Xwc_Z3 WxWF4xGws8uFnxX340ka1Iqry3G3WxJFn7XrZYvF13ZF1rZws8tw48Ar93Z34xWrs7GF47 Gas5WrWxXr47WjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbq8YFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r1rM2 8IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK 021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r 4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx 0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWU JVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxV WUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFF4i UUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Given that only bpf_dynptr is allowed in key_record, simply check whether map->key_record is not NULL to detect if bpf_dynptr is enabled in the map key. Signed-off-by: Hou Tao --- include/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e25ff78f1fabf..737890e5c58b4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -316,6 +316,11 @@ struct bpf_map { s64 __percpu *elem_count; }; +static inline bool bpf_map_has_dynptr_key(const struct bpf_map *map) +{ + return !!map->key_record; +} + static inline const char *btf_field_type_name(enum btf_field_type type) { switch (type) { From patchwork Thu Mar 27 08:34:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030842 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13B35204C01 for ; Thu, 27 Mar 2025 08:22:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063780; cv=none; b=VMlKHWbxNQOkVuPVrNcCvxhQ0zhAbYsd/6t0c98ov2aUZQ57gNHd13Nppx0D7izHBFyNKXSBr5PD8z4QZnD9LSLrHrcz5Qzx+2xwrHRBytbXraM6j9koSAsMdFnL9ddyoqmN9SgNixBU8Zg7P4c7ik0rSyMiBHOF+PW/SDbPJZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063780; c=relaxed/simple; bh=hsgr0kGvABVfDHsGWhOeFfNMJLPpuFzr1AEEDrUy7xc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eNpD2oLFDjyQGUCDMhOUvFX9An9ECKD/ExSVImRRLfJ4JiGR1rz72GwKaF9/hd8L/93jZ9KDOi5Ccg7FICNpG9Spzigl7ddAuw9HxxaFJAMMTlZtrmTizeoHOTK/c5Gz1ngzjcBeYtuUaPwpTF5mCLfGBE5AF6GZsO1C+w+7gh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8j07D7z4f3m6j for ; Thu, 27 Mar 2025 16:22:29 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 78B6B1A156D for ; Thu, 27 Mar 2025 16:22:53 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S8; Thu, 27 Mar 2025 16:22:53 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 04/16] bpf: Split check_stack_range_initialized() into small functions Date: Thu, 27 Mar 2025 16:34:43 +0800 Message-Id: <20250327083455.848708-5-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S8 X-Coremail-Antispam: 1UD129KBjvJXoW3KF1fGw15XryrJFWrCrWUtwb_yoWDtF45pr n7W39rCr4kKay8Xa12v3ZrAFy5CrWvqrWUC345tFyxZr1rur9YgFy0qFyjvr1fCrZ2kw1x KF1vvrZ7Aw4DZaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF 7I0E14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7I U1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao It is a preparatory patch for supporting map key with bpf_dynptr in verifier. The patch splits check_stack_range_initialized() into multiple small functions and the following patch will reuse these functions to check whether the access of stack range which contains bpf_dynptr is valid or not. Beside the splitting of check_stack_range_initialized(), the patch also changes its name to check_stack_range_access() to better reflect its purpose, because the function also allows uninitialized stack range. Signed-off-by: Hou Tao --- kernel/bpf/verifier.c | 209 ++++++++++++++++++++++++------------------ 1 file changed, 121 insertions(+), 88 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 86a2a6408f5ae..9d611d5152789 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -798,7 +798,7 @@ static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_stat * While we don't allow reading STACK_INVALID, it is still possible to * do <8 byte writes marking some but not all slots as STACK_MISC. Then, * helpers or insns can do partial read of that part without failing, - * but check_stack_range_initialized, check_stack_read_var_off, and + * but check_stack_range_access, check_stack_read_var_off, and * check_stack_read_fixed_off will do mark_reg_read for all 8-bytes of * the slot conservatively. Hence we need to prevent those liveness * marking walks. @@ -5430,11 +5430,11 @@ enum bpf_access_src { ACCESS_HELPER = 2, /* the access is performed by a helper */ }; -static int check_stack_range_initialized(struct bpf_verifier_env *env, - int regno, int off, int access_size, - bool zero_size_allowed, - enum bpf_access_type type, - struct bpf_call_arg_meta *meta); +static int check_stack_range_access(struct bpf_verifier_env *env, + int regno, int off, int access_size, + bool zero_size_allowed, + enum bpf_access_type type, + struct bpf_call_arg_meta *meta); static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno) { @@ -5465,8 +5465,8 @@ static int check_stack_read_var_off(struct bpf_verifier_env *env, /* Note that we pass a NULL meta, so raw access will not be permitted. */ - err = check_stack_range_initialized(env, ptr_regno, off, size, - false, BPF_READ, NULL); + err = check_stack_range_access(env, ptr_regno, off, size, + false, BPF_READ, NULL); if (err) return err; @@ -7870,44 +7870,13 @@ static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn) } } -/* When register 'regno' is used to read the stack (either directly or through - * a helper function) make sure that it's within stack boundary and, depending - * on the access type and privileges, that all elements of the stack are - * initialized. - * - * 'off' includes 'regno->off', but not its dynamic part (if any). - * - * All registers that have been spilled on the stack in the slots within the - * read offsets are marked as read. - */ -static int check_stack_range_initialized( - struct bpf_verifier_env *env, int regno, int off, - int access_size, bool zero_size_allowed, - enum bpf_access_type type, struct bpf_call_arg_meta *meta) +static int get_stack_access_range(struct bpf_verifier_env *env, int regno, int off, + int *min_off, int *max_off) { struct bpf_reg_state *reg = reg_state(env, regno); - struct bpf_func_state *state = func(env, reg); - int err, min_off, max_off, i, j, slot, spi; - /* Some accesses can write anything into the stack, others are - * read-only. - */ - bool clobber = false; - - if (access_size == 0 && !zero_size_allowed) { - verbose(env, "invalid zero-sized read\n"); - return -EACCES; - } - - if (type == BPF_WRITE) - clobber = true; - - err = check_stack_access_within_bounds(env, regno, off, access_size, type); - if (err) - return err; - if (tnum_is_const(reg->var_off)) { - min_off = max_off = reg->var_off.value + off; + *min_off = *max_off = reg->var_off.value + off; } else { /* Variable offset is prohibited for unprivileged mode for * simplicity since it requires corresponding support in @@ -7922,49 +7891,76 @@ static int check_stack_range_initialized( regno, tn_buf); return -EACCES; } - /* Only initialized buffer on stack is allowed to be accessed - * with variable offset. With uninitialized buffer it's hard to - * guarantee that whole memory is marked as initialized on - * helper return since specific bounds are unknown what may - * cause uninitialized stack leaking. - */ - if (meta && meta->raw_mode) - meta = NULL; - min_off = reg->smin_value + off; - max_off = reg->smax_value + off; + *min_off = reg->smin_value + off; + *max_off = reg->smax_value + off; } - if (meta && meta->raw_mode) { - /* Ensure we won't be overwriting dynptrs when simulating byte - * by byte access in check_helper_call using meta.access_size. - * This would be a problem if we have a helper in the future - * which takes: - * - * helper(uninit_mem, len, dynptr) - * - * Now, uninint_mem may overlap with dynptr pointer. Hence, it - * may end up writing to dynptr itself when touching memory from - * arg 1. This can be relaxed on a case by case basis for known - * safe cases, but reject due to the possibilitiy of aliasing by - * default. - */ - for (i = min_off; i < max_off + access_size; i++) { - int stack_off = -i - 1; + return 0; +} - spi = __get_spi(i); - /* raw_mode may write past allocated_stack */ - if (state->allocated_stack <= stack_off) - continue; - if (state->stack[spi].slot_type[stack_off % BPF_REG_SIZE] == STACK_DYNPTR) { - verbose(env, "potential write to dynptr at off=%d disallowed\n", i); - return -EACCES; - } - } - meta->access_size = access_size; - meta->regno = regno; +static int allow_uninitialized_stack_range(struct bpf_verifier_env *env, int regno, + int min_off, int max_off, int access_size, + struct bpf_call_arg_meta *meta) +{ + struct bpf_reg_state *reg = reg_state(env, regno); + struct bpf_func_state *state = func(env, reg); + int i, stack_off, spi; + + /* Disallow uninitialized buffer on stack */ + if (!meta || !meta->raw_mode) + return 0; + + /* Only initialized buffer on stack is allowed to be accessed + * with variable offset. With uninitialized buffer it's hard to + * guarantee that whole memory is marked as initialized on + * helper return since specific bounds are unknown what may + * cause uninitialized stack leaking. + */ + if (!tnum_is_const(reg->var_off)) return 0; + + /* Ensure we won't be overwriting dynptrs when simulating byte + * by byte access in check_helper_call using meta.access_size. + * This would be a problem if we have a helper in the future + * which takes: + * + * helper(uninit_mem, len, dynptr) + * + * Now, uninint_mem may overlap with dynptr pointer. Hence, it + * may end up writing to dynptr itself when touching memory from + * arg 1. This can be relaxed on a case by case basis for known + * safe cases, but reject due to the possibilitiy of aliasing by + * default. + */ + for (i = min_off; i < max_off + access_size; i++) { + stack_off = -i - 1; + spi = __get_spi(i); + /* raw_mode may write past allocated_stack */ + if (state->allocated_stack <= stack_off) + continue; + if (state->stack[spi].slot_type[stack_off % BPF_REG_SIZE] == STACK_DYNPTR) { + verbose(env, "potential write to dynptr at off=%d disallowed\n", i); + return -EACCES; + } } + meta->access_size = access_size; + meta->regno = regno; + + return 1; +} + +static int check_stack_range_initialized(struct bpf_verifier_env *env, int regno, + int min_off, int max_off, int access_size, + enum bpf_access_type type) +{ + struct bpf_reg_state *reg = reg_state(env, regno); + struct bpf_func_state *state = func(env, reg); + int i, j, slot, spi; + /* Some accesses can write anything into the stack, others are + * read-only. + */ + bool clobber = type == BPF_WRITE; for (i = min_off; i < max_off + access_size; i++) { u8 *stype; @@ -8013,19 +8009,58 @@ static int check_stack_range_initialized( mark: /* reading any byte out of 8-byte 'spill_slot' will cause * the whole slot to be marked as 'read' - */ - mark_reg_read(env, &state->stack[spi].spilled_ptr, - state->stack[spi].spilled_ptr.parent, - REG_LIVE_READ64); - /* We do not set REG_LIVE_WRITTEN for stack slot, as we can not + * + * We do not set REG_LIVE_WRITTEN for stack slot, as we can not * be sure that whether stack slot is written to or not. Hence, * we must still conservatively propagate reads upwards even if * helper may write to the entire memory range. */ + mark_reg_read(env, &state->stack[spi].spilled_ptr, + state->stack[spi].spilled_ptr.parent, + REG_LIVE_READ64); } + return 0; } +/* When register 'regno' is used to read the stack (either directly or through + * a helper function) make sure that it's within stack boundary and, depending + * on the access type and privileges, that all elements of the stack are + * initialized. + * + * 'off' includes 'regno->off', but not its dynamic part (if any). + * + * All registers that have been spilled on the stack in the slots within the + * read offsets are marked as read. + */ +static int check_stack_range_access(struct bpf_verifier_env *env, int regno, int off, + int access_size, bool zero_size_allowed, + enum bpf_access_type type, struct bpf_call_arg_meta *meta) +{ + int err, min_off, max_off; + + if (access_size == 0 && !zero_size_allowed) { + verbose(env, "invalid zero-sized read\n"); + return -EACCES; + } + + err = check_stack_access_within_bounds(env, regno, off, access_size, type); + if (err) + return err; + + err = get_stack_access_range(env, regno, off, &min_off, &max_off); + if (err) + return err; + + err = allow_uninitialized_stack_range(env, regno, min_off, max_off, access_size, meta); + if (err < 0) + return err; + if (err > 0) + return 0; + + return check_stack_range_initialized(env, regno, min_off, max_off, access_size, type); +} + static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, int access_size, enum bpf_access_type access_type, bool zero_size_allowed, @@ -8079,10 +8114,8 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, access_size, zero_size_allowed, max_access); case PTR_TO_STACK: - return check_stack_range_initialized( - env, - regno, reg->off, access_size, - zero_size_allowed, access_type, meta); + return check_stack_range_access(env, regno, reg->off, access_size, + zero_size_allowed, access_type, meta); case PTR_TO_BTF_ID: return check_ptr_to_btf_access(env, regs, regno, reg->off, access_size, BPF_READ, -1); From patchwork Thu Mar 27 08:34:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030840 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63013204F77 for ; Thu, 27 Mar 2025 08:22:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; cv=none; b=I0PKtbM0/ICvA7Dyd6r3iJTZ4n+Yibrfz6PDXBtodjqA2Wbt9uBAHjjU2jOwED5AhyFY8vrjEOSHTtPR+Ng5kiaINWtypqoYJGVsNzTL/VfK5klG8M3jHFIpTtxgtD1sSHI9us5StC+Tgg6fIRh4Vr55VOZN5g3fU7bSuzdXpfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; c=relaxed/simple; bh=ZsHbutmQo8o9nGkfQUHT5dKNOUBipznv/JtT6paLB7c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UUKQpP+FWijohZhVF6wnPeQfQttUQ8d/mSiXfzyAthaEujz+GQIWmaoU0qx2EuA+tdt6xa61FxAxyRIQLMcpDNCyzz654hYrQeVBT9DB7H/bnhtXHHBvw0pevxFTUnAmSTeUMeuDsJYDNswE7W3FrAhUypzG01RRzS8C1dAS7ZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8r3H5Pz4f3jZd for ; Thu, 27 Mar 2025 16:22:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0AB751A0E99 for ; Thu, 27 Mar 2025 16:22:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S9; Thu, 27 Mar 2025 16:22:53 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 05/16] bpf: Support map key with dynptr in verifier Date: Thu, 27 Mar 2025 16:34:44 +0800 Message-Id: <20250327083455.848708-6-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S9 X-Coremail-Antispam: 1UD129KBjvJXoW3ArWxXw48Aw43Gw4DKry5XFb_yoW3ur4UpF 4kG3sxWr4kKr4IvwsFqFsrAF15Kw1Iqw47GrWrK340yFyrXrZ09Fy0kFyUur13trZ8C347 Jw1qqFZ8uw4UJFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7I U1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The patch basically does the following three things to enable dynptr key for bpf map: 1) Only allow PTR_TO_STACK typed register for dynptr key The main reason is that bpf_dynptr can only be defined in the stack, so for dynptr key only PTR_TO_STACK typed register is allowed. bpf_dynptr could also be represented by CONST_PTR_TO_DYNPTR typed register (e.g., in callback func or subprog), but it is not supported now. 2) Only allow fixed-offset for PTR_TO_STACK register Variable-offset for PTR_TO_STACK typed register is disallowed, because it is impossible to check whether or not the stack access is aligned with BPF_REG_SIZE and is matched with the location of dynptr or non-dynptr part in the map key. 3) Check the layout of the stack content is matched with the btf_record Firstly check the start offset of the stack access is aligned with BPF_REG_SIZE, then check the offset and the size of dynptr/non-dynptr parts in the stack range is consistent with the btf_record of the map key. Signed-off-by: Hou Tao --- kernel/bpf/verifier.c | 186 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 173 insertions(+), 13 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9d611d5152789..05a5636ae4984 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7950,9 +7950,90 @@ static int allow_uninitialized_stack_range(struct bpf_verifier_env *env, int reg return 1; } +struct dynptr_key_state { + const struct btf_record *rec; + const struct btf_field *cur_dynptr; + bool valid_dynptr_id; + int cur_dynptr_id; +}; + +static int init_dynptr_key_state(struct bpf_verifier_env *env, const struct btf_record *rec, + struct dynptr_key_state *state) +{ + unsigned int i; + + /* Find the first dynptr in the dynptr-key */ + for (i = 0; i < rec->cnt; i++) { + if (rec->fields[i].type == BPF_DYNPTR) + break; + } + if (i >= rec->cnt) { + verbose(env, "verifier bug: dynptr not found\n"); + return -EFAULT; + } + + state->rec = rec; + state->cur_dynptr = &rec->fields[i]; + state->valid_dynptr_id = false; + + return 0; +} + +static int check_dynptr_key_access(struct bpf_verifier_env *env, struct dynptr_key_state *state, + struct bpf_reg_state *reg, u8 stype, int offset) +{ + const struct btf_field *dynptr = state->cur_dynptr; + + /* Non-dynptr part before a dynptr or non-dynptr part after + * the last dynptr. + */ + if (offset < dynptr->offset || offset >= dynptr->offset + dynptr->size) { + if (stype == STACK_DYNPTR) { + verbose(env, + "dynptr-key expects non-dynptr at offset %d cur_dynptr_offset %u\n", + offset, dynptr->offset); + return -EACCES; + } + } else { + if (stype != STACK_DYNPTR) { + verbose(env, + "dynptr-key expects dynptr at offset %d cur_dynptr_offset %u\n", + offset, dynptr->offset); + return -EACCES; + } + + /* A dynptr is composed of parts from two dynptrs */ + if (state->valid_dynptr_id && reg->id != state->cur_dynptr_id) { + verbose(env, "malformed dynptr-key at offset %d cur_dynptr_offset %u\n", + offset, dynptr->offset); + return -EACCES; + } + if (!state->valid_dynptr_id) { + state->valid_dynptr_id = true; + state->cur_dynptr_id = reg->id; + } + + if (offset == dynptr->offset + dynptr->size - 1) { + const struct btf_record *rec = state->rec; + unsigned int i; + + for (i = dynptr - rec->fields + 1; i < rec->cnt; i++) { + if (rec->fields[i].type == BPF_DYNPTR) { + state->cur_dynptr = &rec->fields[i]; + state->valid_dynptr_id = false; + break; + } + } + } + } + + return 0; +} + static int check_stack_range_initialized(struct bpf_verifier_env *env, int regno, int min_off, int max_off, int access_size, - enum bpf_access_type type) + enum bpf_access_type type, + struct dynptr_key_state *dynkey) { struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_func_state *state = func(env, reg); @@ -7975,6 +8056,8 @@ static int check_stack_range_initialized(struct bpf_verifier_env *env, int regno stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE]; if (*stype == STACK_MISC) goto mark; + if (dynkey && *stype == STACK_DYNPTR) + goto mark; if ((*stype == STACK_ZERO) || (*stype == STACK_INVALID && env->allow_uninit_stack)) { if (clobber) { @@ -8007,6 +8090,15 @@ static int check_stack_range_initialized(struct bpf_verifier_env *env, int regno } return -EACCES; mark: + if (dynkey) { + int err = check_dynptr_key_access(env, dynkey, + &state->stack[spi].spilled_ptr, + *stype, i - min_off); + + if (err) + return err; + } + /* reading any byte out of 8-byte 'spill_slot' will cause * the whole slot to be marked as 'read' * @@ -8058,7 +8150,60 @@ static int check_stack_range_access(struct bpf_verifier_env *env, int regno, int if (err > 0) return 0; - return check_stack_range_initialized(env, regno, min_off, max_off, access_size, type); + return check_stack_range_initialized(env, regno, min_off, max_off, access_size, type, NULL); +} + +static int check_dynkey_stack_access_offset(struct bpf_verifier_env *env, int regno, int off) +{ + struct bpf_reg_state *reg = reg_state(env, regno); + + if (!tnum_is_const(reg->var_off)) { + verbose(env, "R%d variable offset prohibited for dynptr-key\n", regno); + return -EACCES; + } + + off = reg->var_off.value + off; + if (off % BPF_REG_SIZE) { + verbose(env, "R%d misaligned offset %d for dynptr-key\n", regno, off); + return -EACCES; + } + + return 0; +} + +/* It is almost the same as check_stack_range_access(), except the following + * things: + * (1) no need to check whether access_size is zero (due to non-zero key_size) + * (2) disallow uninitialized stack range + * (3) need BPF_REG_SIZE-aligned access with fixed-size offset + * (4) need to check whether the layout of bpf_dynptr part and non-bpf_dynptr + * part in the stack range is the same as the layout of dynptr key + */ +static int check_dynkey_stack_range_access(struct bpf_verifier_env *env, int regno, int off, + int access_size, struct bpf_call_arg_meta *meta) +{ + enum bpf_access_type type = BPF_READ; + struct dynptr_key_state dynkey; + int err, min_off, max_off; + + err = check_stack_access_within_bounds(env, regno, off, access_size, type); + if (err) + return err; + + err = check_dynkey_stack_access_offset(env, regno, off); + if (err) + return err; + + err = get_stack_access_range(env, regno, off, &min_off, &max_off); + if (err) + return err; + + err = init_dynptr_key_state(env, meta->map_ptr->key_record, &dynkey); + if (err) + return err; + + return check_stack_range_initialized(env, regno, min_off, max_off, access_size, type, + &dynkey); } static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, @@ -9676,18 +9821,33 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "invalid map_ptr to access map->key\n"); return -EACCES; } + key_size = meta->map_ptr->key_size; - err = check_helper_mem_access(env, regno, key_size, BPF_READ, false, NULL); - if (err) - return err; - if (can_elide_value_nullness(meta->map_ptr->map_type)) { - err = get_constant_map_key(env, reg, key_size, &meta->const_map_key); - if (err < 0) { - meta->const_map_key = -1; - if (err == -EOPNOTSUPP) - err = 0; - else - return err; + /* Only allow PTR_TO_STACK for dynptr-key */ + if (bpf_map_has_dynptr_key(meta->map_ptr)) { + if (base_type(reg->type) != PTR_TO_STACK) { + verbose(env, "map dynptr-key requires stack ptr but got %s\n", + reg_type_str(env, reg->type)); + return -EACCES; + } + err = check_dynkey_stack_range_access(env, regno, reg->off, key_size, meta); + if (err) + return err; + } else { + err = check_helper_mem_access(env, regno, key_size, BPF_READ, false, NULL); + if (err) + return err; + + if (can_elide_value_nullness(meta->map_ptr->map_type)) { + err = get_constant_map_key(env, reg, key_size, + &meta->const_map_key); + if (err < 0) { + meta->const_map_key = -1; + if (err == -EOPNOTSUPP) + err = 0; + else + return err; + } } } break; From patchwork Thu Mar 27 08:34:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030839 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BD51204F74 for ; Thu, 27 Mar 2025 08:22:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; cv=none; b=VCLaRxLh7zQo3BWt/JkMUY9LUhJ6b35+iXnxWUfF9kHS/P/G1usLQckHAuIOsDMF0W6VBKbBNtotfjP0ANTYGuFcvNMTRdD1ICXfEFJYGBOEwkDJCn5GallXzyVxKf84mvth+QFypJlhuZZiiEsWebHUGCcBL3O8TO+1GDhlEmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063779; c=relaxed/simple; bh=iYbYzWNYS1TOZdCURnLNvyC1k+2EpCeItDFCF5UiBOs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FBSjDNLa2oLffeEN3zXyb4AYSEKZoX1lZgi6QEEY7R1CkCtT5j8nce0rQlvuOcqDr9qrwDLr2J9OWEF5ljxmArFe8eOYtm1vatsaLVKxONIc1cLjdm50l/ixTpwS8NmyF1HAyrHNVvQKJ/MXqCBVqahQmyEBmzUSNMl+UIRQM5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8k0w7hz4f3lgS for ; Thu, 27 Mar 2025 16:22:30 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 952421A0CD6 for ; Thu, 27 Mar 2025 16:22:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S10; Thu, 27 Mar 2025 16:22:54 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 06/16] bpf: Reuse bpf_dynptr for userspace application use case Date: Thu, 27 Mar 2025 16:34:45 +0800 Message-Id: <20250327083455.848708-7-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S10 X-Coremail-Antispam: 1UD129KBjvJXoW7tr4rKFW7ZrW7Jw4rGry5CFg_yoW8ury7pF s5CrWfC3y8XFW7Cr1Uua1Iyr1ruF4rur17G39rW34Y9FW2gas7ZwnrKF9FyFn5t3yjyr4x XryIgrW5W34rArJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7I U1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao For bpf map with dynptr key support, the userspace application will use bpf_dynptr to represent the variable-sized part in the map key and pass it to bpf syscall. The bpf syscall will copy from bpf_dynptr to construct a corresponding bpf_dynptr_kern object when the map key is an input argument, and copy to bpf_dynptr from a bpf_dynptr_kern object when the map key is an output argument. Instead of adding a new uapi struct (e.g., bpf_dynptr_user) for userspace application, reuse bpf_dynptr to unify the API for both bpf program and userspace application. For the userspace application case, the last 4-bytes of bpf_dynptr are not used, so make it a reserved field. Suggested-by: Alexei Starovoitov Signed-off-by: Hou Tao --- include/uapi/linux/bpf.h | 11 ++++++++++- tools/include/uapi/linux/bpf.h | 11 ++++++++++- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 28705ae677849..560289f0f560b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -7370,7 +7370,16 @@ struct bpf_wq { } __attribute__((aligned(8))); struct bpf_dynptr { - __u64 __opaque[2]; + union { + /* For bpf program */ + __u64 __opaque[2]; + /* For userspace application only */ + struct { + __bpf_md_ptr(void *, data); + __u32 size; + __u32 reserved; + }; + }; } __attribute__((aligned(8))); struct bpf_list_head { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 28705ae677849..560289f0f560b 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -7370,7 +7370,16 @@ struct bpf_wq { } __attribute__((aligned(8))); struct bpf_dynptr { - __u64 __opaque[2]; + union { + /* For bpf program */ + __u64 __opaque[2]; + /* For userspace application only */ + struct { + __bpf_md_ptr(void *, data); + __u32 size; + __u32 reserved; + }; + }; } __attribute__((aligned(8))); struct bpf_list_head { From patchwork Thu Mar 27 08:34:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030841 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70ACB204F81 for ; Thu, 27 Mar 2025 08:22:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063780; cv=none; b=Rx/jTUTsxnplZwxPz2jslFVit/8o+zLCQVzSMkAxYf/zE0ANiku5SExU5jsUqfc8ilXOOMzl8KA6zP+DvcySRfCIpyYETf6Ts/WHnd+/f5NS342XIrolByH96FLvh9PMp6l+o4F1p/zTzMm1EvVu4YQVB2vi7qx2v//NorFI3tE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063780; c=relaxed/simple; bh=ccyPeBNrKQ+K5C/VknquWfulfyq6KXZ31RA8LdfuKT8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CarPLgNvs+/MDlx7UD6f7ttATN+iPSe28GUv2vl2f/uMy7FBXtBPlh+4UGdHSJcDcWHcRzNLIEL9H0pML/BFrPY/pPj3Sdr6/pOX2eRVxfg0Z1HEU+N/s3+zHy6jhZDBZDlrEhCwgMRFrWHThGiasvU8VWGttrSUiHw3Qmjo84c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8s3zB9z4f3jZd for ; Thu, 27 Mar 2025 16:22:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 238D91A083E for ; Thu, 27 Mar 2025 16:22:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S11; Thu, 27 Mar 2025 16:22:54 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 07/16] bpf: Handle bpf_dynptr in bpf syscall when it is used as input Date: Thu, 27 Mar 2025 16:34:46 +0800 Message-Id: <20250327083455.848708-8-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S11 X-Coremail-Antispam: 1UD129KBjvJXoWxWr43XrW7JrWkur1DuryUZFb_yoWrtryUpF 48WryfZrWFvr43Jr98X3WFva1rWrn2qw1UG3s7Jas5Wa1DXrZ8Xr1IqFZYgryYvFykXrn5 Jr4Dta45Cry8ArJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUF9NVUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Introduce bpf_copy_from_dynptr_ukey() helper to handle map key with bpf_dynptr when the map key is used in map lookup, update, delete and get_next_key operations. The helper places all variable-length data of these bpf_dynptr objects at the end of the map key to simplify the allocation and the freeing of map key with dynptr. Signed-off-by: Hou Tao --- kernel/bpf/syscall.c | 98 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 87 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9ded3ba82d356..d6dbcea3c30cb 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1664,10 +1664,83 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) return -ENOTSUPP; } -static void *__bpf_copy_key(void __user *ukey, u64 key_size) +static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) { - if (key_size) - return vmemdup_user(ukey, key_size); + const struct btf_record *record; + const struct btf_field *field; + struct bpf_dynptr_kern *kptr; + void *key, *new_key, *kdata; + unsigned int key_size, size; + struct bpf_dynptr *uptr; + bpfptr_t udata; + unsigned int i; + int err; + + key_size = map->key_size; + key = kvmemdup_bpfptr(ukey, key_size); + if (IS_ERR(key)) + return ERR_CAST(key); + + size = key_size; + record = map->key_record; + for (i = 0; i < record->cnt; i++) { + field = &record->fields[i]; + if (field->type != BPF_DYNPTR) + continue; + uptr = key + field->offset; + if (!uptr->size || uptr->reserved) { + err = -EINVAL; + goto free_key; + } + + size += uptr->size; + /* Overflow ? */ + if (size < uptr->size) { + err = -E2BIG; + goto free_key; + } + } + + /* Place all dynptrs' data in the end of the key */ + new_key = kvrealloc(key, size, GFP_USER | __GFP_NOWARN); + if (!new_key) { + err = -ENOMEM; + goto free_key; + } + + key = new_key; + kdata = key + key_size; + for (i = 0; i < record->cnt; i++) { + field = &record->fields[i]; + if (field->type != BPF_DYNPTR) + continue; + + uptr = key + field->offset; + size = uptr->size; + udata = make_bpfptr((u64)(uintptr_t)uptr->data, bpfptr_is_kernel(ukey)); + if (copy_from_bpfptr(kdata, udata, size)) { + err = -EFAULT; + goto free_key; + } + kptr = (struct bpf_dynptr_kern *)uptr; + bpf_dynptr_init(kptr, kdata, BPF_DYNPTR_TYPE_LOCAL, 0, size); + kdata += size; + } + + return key; + +free_key: + kvfree(key); + return ERR_PTR(err); +} + +static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey) +{ + if (bpf_map_has_dynptr_key(map)) + return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey)); + + if (map->key_size) + return vmemdup_user(ukey, map->key_size); if (ukey) return ERR_PTR(-EINVAL); @@ -1675,10 +1748,13 @@ static void *__bpf_copy_key(void __user *ukey, u64 key_size) return NULL; } -static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size) +static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey) { - if (key_size) - return kvmemdup_bpfptr(ukey, key_size); + if (bpf_map_has_dynptr_key(map)) + return bpf_copy_from_dynptr_ukey(map, ukey); + + if (map->key_size) + return kvmemdup_bpfptr(ukey, map->key_size); if (!bpfptr_is_null(ukey)) return ERR_PTR(-EINVAL); @@ -1715,7 +1791,7 @@ static int map_lookup_elem(union bpf_attr *attr) !btf_record_has_field(map->record, BPF_SPIN_LOCK)) return -EINVAL; - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(map, ukey); if (IS_ERR(key)) return PTR_ERR(key); @@ -1782,7 +1858,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) goto err_put; } - key = ___bpf_copy_key(ukey, map->key_size); + key = ___bpf_copy_key(map, ukey); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1829,7 +1905,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) goto err_put; } - key = ___bpf_copy_key(ukey, map->key_size); + key = ___bpf_copy_key(map, ukey); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1881,7 +1957,7 @@ static int map_get_next_key(union bpf_attr *attr) return -EPERM; if (ukey) { - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(map, ukey); if (IS_ERR(key)) return PTR_ERR(key); } else { @@ -2170,7 +2246,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) goto err_put; } - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(map, ukey); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; From patchwork Thu Mar 27 08:34:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030851 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFBD5204F7F for ; Thu, 27 Mar 2025 08:23:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063786; cv=none; b=WenKj9HEGoJdikBg0lSXB9ctbldM4ak81Ay7IobhvJq9symrqBGiJE46u/KJmyKhQjtinEz+cvvH4MIi57bywGbEOSebsShaNXF1W4H6f35+Xr8t1LAgFbTLJnonp2EaI8pi9MhKqss/BFBqr3GvAiicKyg++wu5bbvzYVwb0BM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063786; c=relaxed/simple; bh=6NmuHmprQGQyMyd6DhwXoJJEJPrriVBTKeim9p4R4E0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eOg3sM7t9XpKsovgv6R6CiUwTx1Fsb+p07Kggjy3PXAHcZDmMqmOqmRUAJix/32Tex5rPWQzhOw8MHPC6ewGhZrgkeJ6a+aAEtsWA+wmhhHwQ8TPO4fj74UVMeBLC+fM5XddXJFnOcH0iDfKF7XEplbsH3a9Ji7dXcLuPicq5NQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8m5DQTz4f3jXm for ; Thu, 27 Mar 2025 16:22:32 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id AD7EE1A0AD5 for ; Thu, 27 Mar 2025 16:22:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S12; Thu, 27 Mar 2025 16:22:55 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 08/16] bpf: Handle bpf_dynptr in bpf syscall when it is used as output Date: Thu, 27 Mar 2025 16:34:47 +0800 Message-Id: <20250327083455.848708-9-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S12 X-Coremail-Antispam: 1UD129KBjvJXoW3WF1fuF1DAr4fAF1rJw1fWFg_yoW7KF15pF 48G3sxZr4Fqr43JFZ8X3Wjv3yrtrn7Ww1UGas3Ka4rWF9xWr90vr1xKFW09ryYvFyDCr12 vws2qr98ZrWxJrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUF9NVUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao For get_next_key operation, unext_key is used as an output argument. When there is dynptr in map key, unext_key will also be used as an input argument, because the userspace application needs to pre-allocate a buffer for each variable-length part in the map key and save the length and the address of these buffers in bpf_dynptr objects. To support get_next_key op for map with dynptr key, map_get_next_key() first calls bpf_copy_from_dynptr_ukey() to construct a map key in which each bpf_dynptr_kern object has the same size as the corresponding bpf_dynptr object. It then calls ->map_get_next_key() to get the next_key, and finally calls bpf_copy_to_dynptr_ukey() to copy both the non-dynptr part and dynptr part in the map key to unext_key. Signed-off-by: Hou Tao --- kernel/bpf/syscall.c | 89 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 74 insertions(+), 15 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index d6dbcea3c30cb..40c3d85b06bae 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1664,7 +1664,7 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) return -ENOTSUPP; } -static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) +static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey, bool copy_data) { const struct btf_record *record; const struct btf_field *field; @@ -1672,7 +1672,6 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) void *key, *new_key, *kdata; unsigned int key_size, size; struct bpf_dynptr *uptr; - bpfptr_t udata; unsigned int i; int err; @@ -1687,6 +1686,7 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) field = &record->fields[i]; if (field->type != BPF_DYNPTR) continue; + uptr = key + field->offset; if (!uptr->size || uptr->reserved) { err = -EINVAL; @@ -1717,10 +1717,14 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) uptr = key + field->offset; size = uptr->size; - udata = make_bpfptr((u64)(uintptr_t)uptr->data, bpfptr_is_kernel(ukey)); - if (copy_from_bpfptr(kdata, udata, size)) { - err = -EFAULT; - goto free_key; + if (copy_data) { + bpfptr_t udata = make_bpfptr((u64)(uintptr_t)uptr->data, + bpfptr_is_kernel(ukey)); + + if (copy_from_bpfptr(kdata, udata, size)) { + err = -EFAULT; + goto free_key; + } } kptr = (struct bpf_dynptr_kern *)uptr; bpf_dynptr_init(kptr, kdata, BPF_DYNPTR_TYPE_LOCAL, 0, size); @@ -1737,7 +1741,7 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey) static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey) { if (bpf_map_has_dynptr_key(map)) - return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey)); + return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey), true); if (map->key_size) return vmemdup_user(ukey, map->key_size); @@ -1751,7 +1755,7 @@ static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey) static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey) { if (bpf_map_has_dynptr_key(map)) - return bpf_copy_from_dynptr_ukey(map, ukey); + return bpf_copy_from_dynptr_ukey(map, ukey, true); if (map->key_size) return kvmemdup_bpfptr(ukey, map->key_size); @@ -1762,6 +1766,51 @@ static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey) return NULL; } +static int bpf_copy_to_dynptr_ukey(const struct bpf_map *map, + void __user *ukey, void *key) +{ + struct bpf_dynptr __user *uptr; + struct bpf_dynptr_kern *kptr; + struct btf_record *record; + unsigned int i, offset; + + offset = 0; + record = map->key_record; + for (i = 0; i < record->cnt; i++) { + struct btf_field *field; + unsigned int size; + void *udata; + + field = &record->fields[i]; + if (field->type != BPF_DYNPTR) + continue; + + /* Any no-dynptr part before the dynptr ? */ + if (offset < field->offset && + copy_to_user(ukey + offset, key + offset, field->offset - offset)) + return -EFAULT; + + /* dynptr part */ + uptr = ukey + field->offset; + if (copy_from_user(&udata, &uptr->data, sizeof(udata))) + return -EFAULT; + + kptr = key + field->offset; + size = __bpf_dynptr_size(kptr); + if (copy_to_user((void __user *)udata, __bpf_dynptr_data(kptr, size), size) || + put_user(size, &uptr->size) || put_user(0, &uptr->reserved)) + return -EFAULT; + + offset = field->offset + field->size; + } + + if (offset < map->key_size && + copy_to_user(ukey + offset, key + offset, map->key_size - offset)) + return -EFAULT; + + return 0; +} + /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags @@ -1964,10 +2013,19 @@ static int map_get_next_key(union bpf_attr *attr) key = NULL; } - err = -ENOMEM; - next_key = kvmalloc(map->key_size, GFP_USER); - if (!next_key) + if (bpf_map_has_dynptr_key(map)) + next_key = bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(unext_key), false); + else + next_key = kvmalloc(map->key_size, GFP_USER); + if (IS_ERR_OR_NULL(next_key)) { + if (!next_key) { + err = -ENOMEM; + } else { + err = PTR_ERR(next_key); + next_key = NULL; + } goto free_key; + } if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_get_next_key(map, key, next_key); @@ -1981,12 +2039,13 @@ static int map_get_next_key(union bpf_attr *attr) if (err) goto free_next_key; - err = -EFAULT; - if (copy_to_user(unext_key, next_key, map->key_size) != 0) + if (bpf_map_has_dynptr_key(map)) + err = bpf_copy_to_dynptr_ukey(map, unext_key, next_key); + else + err = copy_to_user(unext_key, next_key, map->key_size) ? -EFAULT : 0; + if (err) goto free_next_key; - err = 0; - free_next_key: kvfree(next_key); free_key: From patchwork Thu Mar 27 08:34:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030844 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CF40204F96 for ; Thu, 27 Mar 2025 08:22:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063781; cv=none; b=pn1KDqddWbqrjXlgHF9UrnXY50IU6Lr/kGNqlk0pTmWVf68j2PrETEIyVNVbfWjHtYmXrtxpuEdXpsm9ul5/Ipb2NKVG1B5vWHc1ylnn7SEqrd3lgEvHun/0ieS7nZyblTPK94+k8yMKpLpY/jX7CX9MvTGExtrd02L47HLLPg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063781; c=relaxed/simple; bh=K9b6HESaDBVB5EcoK0mYstVpHc8+vb3/8Cl/x55Zr14=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mBjmMZG/E9gcq2Y6jGE5Kz8+XE4FZUGPeC2pp1UHgfQvSkmSAmVYUKiHqdaRYurAeU+KwUesdQAnFwN0LhapKyZxe+TMnCDrcBUWw9hv1Lggkq9VslB+MgZ/s85VCcDnB22sWO0v7aM6bhJ+qIC/Syxp6S7dnjw0TaYAxZV2wI4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8t58nqz4f3jtK for ; Thu, 27 Mar 2025 16:22:38 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 4B3D21A083E for ; Thu, 27 Mar 2025 16:22:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S13; Thu, 27 Mar 2025 16:22:56 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 09/16] bpf: Support basic operations for dynptr key in hash map Date: Thu, 27 Mar 2025 16:34:48 +0800 Message-Id: <20250327083455.848708-10-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S13 X-Coremail-Antispam: 1UD129KBjvAXoWfZFW7AFy3KryrAryDJr47Jwb_yoW8ZFyfJo WfW3y3CF48GF4xt3ykWFs7W3WrX345JayUJw4aqwsxWw4avr4YkryxCF43Kay5XF15tF10 gry0y3sxur4rWr4rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOY7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4 kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtV W8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67 AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZE Xa7IU1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The patch supports lookup, update, delete and lookup_delete operations for hash map with dynptr map. There are two major differences between the implementation of normal hash map and dynptr-keyed hash map: 1) dynptr-keyed hash map doesn't support pre-allocation. The reason is that the dynptr in map key is allocated dynamically through bpf mem allocator. The length limitation for these dynptrs is 4088 bytes now. Because there dynptrs are allocated dynamically, the consumption of memory will be smaller compared with normal hash map when there are big differences between the length of these dynptrs. 2) the freed element in dynptr-key map will not be reused immediately For normal hash map, the freed element may be reused immediately by the newly-added element, so the lookup may return an incorrect result due to element deletion and element reuse. However dynptr-key map could not do that, there are pointers (dynptrs) in the map key and the updates of these dynptrs are not atomic: both the address and the length of the dynptr will be updated. If the element is reused immediately, the access of the dynptr in the freed element may incur invalid memory access due to the mismatch between the address and the size of dynptr, so reuse the freed element after one RCU grace period. Beside the differences above, dynptr-keyed hash map also needs to handle the maybe-nullified dynptr in the map key. After the support of dynptr key in hash map, the performance of lookup and update/delete operations in map_perf_test degrades a lot. Marking lookup_nulls_elem_raw() and lookup_elem_raw() as always_inline will narrow the gap from 22%/10% to ~3%. Therefore, the patch also adds always_inline for these two hot functions. The following lines show the detailed performance numbers: before patch: 0:hash_map_perf pre-alloc 716183 events per sec 0:hash_map_perf kmalloc 718449 events per sec 0:hash_lookup 96028984 lookups per sec after patch (without always_inline): 0:hash_map_perf pre-alloc 680580 events per sec 0:hash_map_perf kmalloc 648885 events per sec 0:hash_lookup 77693901 lookups per sec after patch: 0:hash_map_perf pre-alloc 701188 events per sec 0:hash_map_perf kmalloc 690954 events per sec 0:hash_lookup 93802965 lookups per sec Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 291 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 261 insertions(+), 30 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5a5adc66b8e22..028542c2b4237 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -105,6 +105,7 @@ struct bpf_htab { u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; + struct bpf_mem_alloc dynptr_ma; }; /* each htab element is struct htab_elem + key + value */ @@ -586,13 +587,55 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) return ERR_PTR(err); } -static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd) +static inline u32 __htab_map_hash(const void *key, u32 key_len, u32 hashrnd) { if (likely(key_len % 4 == 0)) return jhash2(key, key_len / 4, hashrnd); return jhash(key, key_len, hashrnd); } +static u32 htab_map_dynptr_hash(const void *key, u32 key_len, u32 hashrnd, + const struct btf_record *rec) +{ + unsigned int i, cnt = rec->cnt; + unsigned int hash = hashrnd; + unsigned int offset = 0; + + for (i = 0; i < cnt; i++) { + const struct btf_field *field = &rec->fields[i]; + const struct bpf_dynptr_kern *kptr; + unsigned int len; + + if (field->type != BPF_DYNPTR) + continue; + + /* non-dynptr part ? */ + if (offset < field->offset) + hash = jhash(key + offset, field->offset - offset, hash); + + /* Skip nullified dynptr */ + kptr = key + field->offset; + if (kptr->data) { + len = __bpf_dynptr_size(kptr); + hash = jhash(__bpf_dynptr_data(kptr, len), len, hash); + } + offset = field->offset + field->size; + } + + if (offset < key_len) + hash = jhash(key + offset, key_len - offset, hash); + + return hash; +} + +static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd, + const struct btf_record *rec) +{ + if (likely(!rec)) + return __htab_map_hash(key, key_len, hashrnd); + return htab_map_dynptr_hash(key, key_len, hashrnd, rec); +} + static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) { return &htab->buckets[hash & (htab->n_buckets - 1)]; @@ -603,15 +646,68 @@ static inline struct hlist_nulls_head *select_bucket(struct bpf_htab *htab, u32 return &__select_bucket(htab, hash)->head; } +static bool is_same_dynptr_key(const void *key, const void *tgt, unsigned int key_size, + const struct btf_record *rec) +{ + unsigned int i, cnt = rec->cnt; + unsigned int offset = 0; + + for (i = 0; i < cnt; i++) { + const struct btf_field *field = &rec->fields[i]; + const struct bpf_dynptr_kern *kptr, *tgt_kptr; + const void *data, *tgt_data; + unsigned int len; + + if (field->type != BPF_DYNPTR) + continue; + + if (offset < field->offset && + memcmp(key + offset, tgt + offset, field->offset - offset)) + return false; + + /* + * For a nullified dynptr in the target key, __bpf_dynptr_size() + * will return 0, and there will be no match for the target key. + */ + kptr = key + field->offset; + tgt_kptr = tgt + field->offset; + len = __bpf_dynptr_size(kptr); + if (len != __bpf_dynptr_size(tgt_kptr)) + return false; + + data = __bpf_dynptr_data(kptr, len); + tgt_data = __bpf_dynptr_data(tgt_kptr, len); + if (memcmp(data, tgt_data, len)) + return false; + + offset = field->offset + field->size; + } + + if (offset < key_size && + memcmp(key + offset, tgt + offset, key_size - offset)) + return false; + + return true; +} + +static inline bool htab_is_same_key(const void *key, const void *tgt, unsigned int key_size, + const struct btf_record *rec) +{ + if (likely(!rec)) + return !memcmp(key, tgt, key_size); + return is_same_dynptr_key(key, tgt, key_size, rec); +} + /* this lookup function can only be called with bucket lock taken */ -static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash, - void *key, u32 key_size) +static __always_inline struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash, + void *key, u32 key_size, + const struct btf_record *record) { struct hlist_nulls_node *n; struct htab_elem *l; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) - if (l->hash == hash && !memcmp(&l->key, key, key_size)) + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record)) return l; return NULL; @@ -621,16 +717,17 @@ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash * the unlikely event when elements moved from one bucket into another * while link list is being walked */ -static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head, - u32 hash, void *key, - u32 key_size, u32 n_buckets) +static __always_inline struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head, + u32 hash, void *key, + u32 key_size, u32 n_buckets, + const struct btf_record *record) { struct hlist_nulls_node *n; struct htab_elem *l; again: hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) - if (l->hash == hash && !memcmp(&l->key, key, key_size)) + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record)) return l; if (unlikely(get_nulls_value(n) != (hash & (n_buckets - 1)))) @@ -647,6 +744,7 @@ static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head, static void *__htab_map_lookup_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + const struct btf_record *record; struct hlist_nulls_head *head; struct htab_elem *l; u32 hash, key_size; @@ -655,12 +753,13 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key) !rcu_read_lock_bh_held()); key_size = map->key_size; + record = map->key_record; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, record); head = select_bucket(htab, hash); - l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); + l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, record); return l; } @@ -750,6 +849,26 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map, return insn - insn_buf; } +static void htab_free_dynptr_key(struct bpf_htab *htab, void *key) +{ + const struct btf_record *record = htab->map.key_record; + unsigned int i, cnt = record->cnt; + + for (i = 0; i < cnt; i++) { + const struct btf_field *field = &record->fields[i]; + struct bpf_dynptr_kern *kptr; + + if (field->type != BPF_DYNPTR) + continue; + + /* It may be accessed concurrently, so don't overwrite + * the kptr. + */ + kptr = key + field->offset; + bpf_mem_free_rcu(&htab->dynptr_ma, kptr->data); + } +} + static void check_and_free_fields(struct bpf_htab *htab, struct htab_elem *elem) { @@ -804,6 +923,68 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) return l == tgt_l; } +static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size) +{ + const struct btf_record *rec = htab->map.key_record; + struct bpf_dynptr_kern *dst_kptr; + const struct btf_field *field; + unsigned int i, cnt, offset; + int err; + + offset = 0; + cnt = rec->cnt; + for (i = 0; i < cnt; i++) { + const struct bpf_dynptr_kern *kptr; + unsigned int len; + const void *data; + void *dst_data; + + field = &rec->fields[i]; + if (field->type != BPF_DYNPTR) + continue; + + if (offset < field->offset) + memcpy(dst_key + offset, key + offset, field->offset - offset); + + /* Doesn't support nullified dynptr in map key */ + kptr = key + field->offset; + if (!kptr->data) { + err = -EINVAL; + goto out; + } + len = __bpf_dynptr_size(kptr); + data = __bpf_dynptr_data(kptr, len); + + dst_data = bpf_mem_alloc(&htab->dynptr_ma, len); + if (!dst_data) { + err = -ENOMEM; + goto out; + } + + memcpy(dst_data, data, len); + dst_kptr = dst_key + field->offset; + bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len); + + offset = field->offset + field->size; + } + + if (offset < key_size) + memcpy(dst_key + offset, key + offset, key_size - offset); + + return 0; + +out: + while (i-- > 0) { + field = &rec->fields[i]; + if (field->type != BPF_DYNPTR) + continue; + + dst_kptr = dst_key + field->offset; + bpf_mem_free(&htab->dynptr_ma, dst_kptr->data); + } + return err; +} + /* Called from syscall */ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) { @@ -820,12 +1001,12 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) if (!key) goto find_first_elem; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, NULL); head = select_bucket(htab, hash); /* lookup the key */ - l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); + l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, NULL); if (!l) goto find_first_elem; @@ -865,11 +1046,27 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { + bool dynptr_in_key = bpf_map_has_dynptr_key(&htab->map); + + if (dynptr_in_key) + htab_free_dynptr_key(htab, l->key); + check_and_free_fields(htab, l); if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); - bpf_mem_cache_free(&htab->ma, l); + + /* + * For dynptr key, the update of dynptr in the key is not atomic: + * both the pointer and the size are updated. If the element is reused + * immediately, the access of the dynptr key during lookup procedure may + * incur invalid memory access due to mismatch between the size and the + * data pointer, so reuse the element after one RCU GP. + */ + if (dynptr_in_key) + bpf_mem_cache_free_rcu(&htab->ma, l); + else + bpf_mem_cache_free(&htab->ma, l); } static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) @@ -1016,7 +1213,19 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, } } - memcpy(l_new->key, key, key_size); + if (bpf_map_has_dynptr_key(&htab->map)) { + int copy_err; + + copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size); + if (copy_err) { + bpf_mem_cache_free(&htab->ma, l_new); + l_new = ERR_PTR(copy_err); + goto dec_count; + } + } else { + memcpy(l_new->key, key, key_size); + } + if (percpu) { if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); @@ -1072,7 +1281,8 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - struct htab_elem *l_new = NULL, *l_old; + const struct btf_record *key_record = map->key_record; + struct htab_elem *l_new, *l_old; struct hlist_nulls_head *head; unsigned long flags; void *old_map_ptr; @@ -1089,7 +1299,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, key_record); b = __select_bucket(htab, hash); head = &b->head; @@ -1099,7 +1309,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, return -EINVAL; /* find an element without taking the bucket lock */ l_old = lookup_nulls_elem_raw(head, hash, key, key_size, - htab->n_buckets); + htab->n_buckets, key_record); ret = check_flags(htab, l_old, map_flags); if (ret) return ret; @@ -1120,7 +1330,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, if (ret) return ret; - l_old = lookup_elem_raw(head, hash, key, key_size); + l_old = lookup_elem_raw(head, hash, key, key_size, key_record); ret = check_flags(htab, l_old, map_flags); if (ret) @@ -1207,7 +1417,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = __htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; @@ -1227,7 +1437,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value if (ret) goto err_lock_bucket; - l_old = lookup_elem_raw(head, hash, key, key_size); + l_old = lookup_elem_raw(head, hash, key, key_size, NULL); ret = check_flags(htab, l_old, map_flags); if (ret) @@ -1276,7 +1486,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = __htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; @@ -1285,7 +1495,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, if (ret) return ret; - l_old = lookup_elem_raw(head, hash, key, key_size); + l_old = lookup_elem_raw(head, hash, key, key_size, NULL); ret = check_flags(htab, l_old, map_flags); if (ret) @@ -1331,7 +1541,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, NULL); b = __select_bucket(htab, hash); head = &b->head; @@ -1351,7 +1561,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, if (ret) goto err_lock_bucket; - l_old = lookup_elem_raw(head, hash, key, key_size); + l_old = lookup_elem_raw(head, hash, key, key_size, NULL); ret = check_flags(htab, l_old, map_flags); if (ret) @@ -1397,6 +1607,7 @@ static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, static long htab_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + const struct btf_record *key_record = map->key_record; struct hlist_nulls_head *head; struct bucket *b; struct htab_elem *l; @@ -1409,7 +1620,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, key_record); b = __select_bucket(htab, hash); head = &b->head; @@ -1417,7 +1628,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) if (ret) return ret; - l = lookup_elem_raw(head, hash, key, key_size); + l = lookup_elem_raw(head, hash, key, key_size, key_record); if (l) hlist_nulls_del_rcu(&l->hash_node); else @@ -1445,7 +1656,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) key_size = map->key_size; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = __htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; @@ -1453,7 +1664,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) if (ret) return ret; - l = lookup_elem_raw(head, hash, key, key_size); + l = lookup_elem_raw(head, hash, key, key_size, NULL); if (l) hlist_nulls_del_rcu(&l->hash_node); @@ -1547,6 +1758,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_elem_count(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->dynptr_ma); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) @@ -1580,6 +1792,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, bool is_percpu, u64 flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + const struct btf_record *key_record; struct hlist_nulls_head *head; unsigned long bflags; struct htab_elem *l; @@ -1588,8 +1801,9 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, int ret; key_size = map->key_size; + key_record = map->key_record; - hash = htab_map_hash(key, key_size, htab->hashrnd); + hash = htab_map_hash(key, key_size, htab->hashrnd, key_record); b = __select_bucket(htab, hash); head = &b->head; @@ -1597,7 +1811,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, if (ret) return ret; - l = lookup_elem_raw(head, hash, key, key_size); + l = lookup_elem_raw(head, hash, key, key_size, key_record); if (!l) { ret = -ENOENT; goto out_unlock; @@ -2251,6 +2465,22 @@ static u64 htab_map_mem_usage(const struct bpf_map *map) return usage; } +static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf, + const struct btf_type *key_type, const struct btf_type *value_type) +{ + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + + /* Only support non-preallocated map */ + if (bpf_map_has_dynptr_key(map)) { + if (htab_is_prealloc(htab)) + return -EINVAL; + + return bpf_mem_alloc_init(&htab->dynptr_ma, 0, false); + } + + return 0; +} + BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab) const struct bpf_map_ops htab_map_ops = { .map_meta_equal = bpf_map_meta_equal, @@ -2264,6 +2494,7 @@ const struct bpf_map_ops htab_map_ops = { .map_update_elem = htab_map_update_elem, .map_delete_elem = htab_map_delete_elem, .map_gen_lookup = htab_map_gen_lookup, + .map_check_btf = htab_map_check_btf, .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, From patchwork Thu Mar 27 08:34:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030843 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E44B82054ED for ; Thu, 27 Mar 2025 08:22:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063781; cv=none; b=EF59jvvbnthTHPsE8WmJVXvEEciyjQztEhUFPfc9yNw5uPFNnzI5r48NC90bdP1d54nnKs+ZBV9n4UOwNJSFgCPB/O7c7u9pkYg0bK88F7HrZ3PnNKkAjYOlo/RVJm96pAvDzQKoqidrZC9a4wZykWNQQ+5KER/DVVpRiUzpqlU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063781; c=relaxed/simple; bh=AEJO9vFVL52Kxhyeulyf7zWvLcIZD4OL/9top033pw0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IKdTVmgcxgXjAtCjQIzwBdA/DEWK4aGCOtOOuOjaS18T5w7LMrq+tAWI//S13XwRhFumCENIIBaEjhtTxgBjTTfOjzReuEKmbJ0nxNM7+cCd1nJWt3WLNLxfvT0txyhAV3CbMPwklOL65wh2mjtwz8bNxFFKlLPYfn+xviK1Cf8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8v1qQ0z4f3jtP for ; Thu, 27 Mar 2025 16:22:39 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id CF7611A0E96 for ; Thu, 27 Mar 2025 16:22:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S14; Thu, 27 Mar 2025 16:22:56 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 10/16] bpf: Export bpf_dynptr_set_size Date: Thu, 27 Mar 2025 16:34:49 +0800 Message-Id: <20250327083455.848708-11-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S14 X-Coremail-Antispam: 1UD129KBjvJXoW7Aw18tF15tFWrWrWUKrW3Wrg_yoW8Gw13pF ykC34xXr48tFZ2qw4UJFsavw4Ykay7Wr17GFykt34rursFqF9I9r1jgry7Kr98t3ykCr43 Ar92grWFvry8ZrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUF9NVUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao It will be used by the following patch to shrink the size of dynptr when the actual data length is smaller than the size of dynptr during map_get_next_key operation. Signed-off-by: Hou Tao --- include/linux/bpf.h | 1 + kernel/bpf/helpers.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 737890e5c58b4..4f4b43b68f8d1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1361,6 +1361,7 @@ enum bpf_dynptr_type { }; int bpf_dynptr_check_size(u32 size); +void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size); u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr); const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len); void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index ddaa41a70676c..67c13a6d20dae 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1688,7 +1688,7 @@ u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr) return ptr->size & DYNPTR_SIZE_MASK; } -static void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size) +void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size) { u32 metadata = ptr->size & ~DYNPTR_SIZE_MASK; From patchwork Thu Mar 27 08:34:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030852 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B59AD204F83 for ; Thu, 27 Mar 2025 08:23:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063787; cv=none; b=rbPEzTiEaN02CS/GPH3vPIQ7LmJZTumR2L1NP6IXnBFgdvwUkAeeoq+EKarvf7lKtq0WWJBxIAab47uqKMt5LupvG08SAj3dKuesZbFJAyEuss+gJSKtDGv1Wjaf9grU2pPItkr2XYfgOUZdGKrGDXYktJcXPju5qVdq+Y2wftA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063787; c=relaxed/simple; bh=QJJDTnwYgISQUbuzvnXJLr4lJHAYz6U+kC/Yj9Hi6E8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VA/zdYJSstsdrJKjPJDhlm4w/HjA9zd20Ym2Q7wC1R8fJE3iLwlijTrUcVn838NLZ9KQCAqC1Lw/r1SIYNIu2yxxzAON2p0n2fjg5NqT6F/AIh+a6oc5LAhw2RYinalsXW5uZ+dY8k5QzexQnYqYnlF2WhleLyL8FfRDihDv4HE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8p3Bz9z4f3jXm for ; Thu, 27 Mar 2025 16:22:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 6877B1A0EC9 for ; Thu, 27 Mar 2025 16:22:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S15; Thu, 27 Mar 2025 16:22:57 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 11/16] bpf: Support get_next_key operation for dynptr key in hash map Date: Thu, 27 Mar 2025 16:34:50 +0800 Message-Id: <20250327083455.848708-12-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S15 X-Coremail-Antispam: 1UD129KBjvJXoWxXF1UXryxGw4DJr4kXFW8Zwb_yoWrKF1fpF 18Ga97Xw40kF4DtF45Wan2vw43Kr1Igw109FykGas7KFnFgr97Zw18tFW0kryYyFZrJr4f tr4jqa45uws5JrUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUF9NVUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao It firstly passed the key_record to htab_map_hash() and lookup_nulls_eleme_raw() to find the target key, then it uses htab_copy_dynptr_key() helper to copy from the target key to the next key used for output. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 56 ++++++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 028542c2b4237..2c3017086e4ab 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -923,7 +923,8 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) return l == tgt_l; } -static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size) +static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size, + bool copy_in) { const struct btf_record *rec = htab->map.key_record; struct bpf_dynptr_kern *dst_kptr; @@ -948,22 +949,32 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void /* Doesn't support nullified dynptr in map key */ kptr = key + field->offset; - if (!kptr->data) { + if (copy_in && !kptr->data) { err = -EINVAL; goto out; } len = __bpf_dynptr_size(kptr); data = __bpf_dynptr_data(kptr, len); - dst_data = bpf_mem_alloc(&htab->dynptr_ma, len); - if (!dst_data) { - err = -ENOMEM; - goto out; - } + dst_kptr = dst_key + field->offset; + if (copy_in) { + dst_data = bpf_mem_alloc(&htab->dynptr_ma, len); + if (!dst_data) { + err = -ENOMEM; + goto out; + } + bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len); + } else { + dst_data = __bpf_dynptr_data_rw(dst_kptr, len); + if (!dst_data) { + err = -ENOSPC; + goto out; + } + if (__bpf_dynptr_size(dst_kptr) > len) + bpf_dynptr_set_size(dst_kptr, len); + } memcpy(dst_data, data, len); - dst_kptr = dst_key + field->offset; - bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len); offset = field->offset + field->size; } @@ -974,7 +985,7 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void return 0; out: - while (i-- > 0) { + while (copy_in && i-- > 0) { field = &rec->fields[i]; if (field->type != BPF_DYNPTR) continue; @@ -985,10 +996,22 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void return err; } +static inline int htab_copy_next_key(struct bpf_htab *htab, void *next_key, const void *key, + u32 key_size) +{ + if (!bpf_map_has_dynptr_key(&htab->map)) { + memcpy(next_key, key, key_size); + return 0; + } + + return htab_copy_dynptr_key(htab, next_key, key, key_size, false); +} + /* Called from syscall */ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + const struct btf_record *key_record = map->key_record; struct hlist_nulls_head *head; struct htab_elem *l, *next_l; u32 hash, key_size; @@ -1001,13 +1024,12 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) if (!key) goto find_first_elem; - hash = htab_map_hash(key, key_size, htab->hashrnd, NULL); + hash = htab_map_hash(key, key_size, htab->hashrnd, key_record); head = select_bucket(htab, hash); /* lookup the key */ - l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, NULL); - + l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, key_record); if (!l) goto find_first_elem; @@ -1017,8 +1039,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) if (next_l) { /* if next elem in this hash list is non-zero, just return it */ - memcpy(next_key, next_l->key, key_size); - return 0; + return htab_copy_next_key(htab, next_key, next_l->key, key_size); } /* no more elements in this hash list, go to the next bucket */ @@ -1035,8 +1056,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) struct htab_elem, hash_node); if (next_l) { /* if it's not empty, just return it */ - memcpy(next_key, next_l->key, key_size); - return 0; + return htab_copy_next_key(htab, next_key, next_l->key, key_size); } } @@ -1216,7 +1236,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, if (bpf_map_has_dynptr_key(&htab->map)) { int copy_err; - copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size); + copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size, true); if (copy_err) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(copy_err); From patchwork Thu Mar 27 08:34:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030848 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72BC12054FF for ; Thu, 27 Mar 2025 08:23:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; cv=none; b=coNcAnlKhkJbCKFGyYLKSD0p6JtCe5fnPdD71i5uJEcyOMM/eRtDxy1xRLEldMOvgE3oCpGb07ShMO4t+nBj5SM82+E1WEf+5cTJVXa3+31UDpH5Cms6c3mbPY+iTT8Tt25N1GO6TYYXQbAuBQRYYnjD8zpxpb3O+JKEbqLGBZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; c=relaxed/simple; bh=KgvpsCbDfFWKd0vl1YCZmd4rNxB7vuBm2d9HEBt0gpI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WfIkOI8WFEf5q4KT4KSVWb3gWCSYy/5tuVeJQ6W99NeRYWgBPMmxBvz8moaNFAhmPmIiPSfmp0xFhKvDDuzMaOOEZQ+uMVdENSv/R8mqZR3Zo02zAwKTcu9suQlvczp3DPq3APKjkbmIswzPttVm4MPJF/Kbp0UwZFuzDO/1j2c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8p72vHz4f3jXb for ; Thu, 27 Mar 2025 16:22:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id EC3E01A0EC9 for ; Thu, 27 Mar 2025 16:22:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S16; Thu, 27 Mar 2025 16:22:57 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 12/16] bpf: Disable unsupported operations for map with dynptr key Date: Thu, 27 Mar 2025 16:34:51 +0800 Message-Id: <20250327083455.848708-13-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S16 X-Coremail-Antispam: 1UD129KBjvJXoWxAw4xCw17Zr1rur4rCr4xCrg_yoW5Gry8pF 48JF97ur40vF47X342qa1kZ34UXw1UK347Ca1vy34rtFnrXr9Igr18J3W3Xr9I9FWUJ3yI yw429rWFv3yUurJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r 43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUF9NVUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Batched map operations, dumping the map content through bpffs, and iterating over each element using the bpf_for_each_map_elem() helper or the bpf map element iterator are not supported for maps with dynptr keys. Therefore, disable these operations for now. Signed-off-by: Hou Tao --- include/linux/bpf.h | 3 ++- kernel/bpf/map_iter.c | 3 +++ kernel/bpf/syscall.c | 4 ++++ kernel/bpf/verifier.c | 4 ++++ 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4f4b43b68f8d1..59295dd8d6fd3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -629,7 +629,8 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map) static inline bool bpf_map_support_seq_show(const struct bpf_map *map) { return (map->btf_value_type_id || map->btf_vmlinux_value_type_id) && - map->ops->map_seq_show_elem; + map->ops->map_seq_show_elem && + !bpf_map_has_dynptr_key(map); } int map_check_no_btf(const struct bpf_map *map, diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c index 9575314f40a69..775d8bc63ed5d 100644 --- a/kernel/bpf/map_iter.c +++ b/kernel/bpf/map_iter.c @@ -113,6 +113,9 @@ static int bpf_iter_attach_map(struct bpf_prog *prog, if (IS_ERR(map)) return PTR_ERR(map); + if (bpf_map_has_dynptr_key(map)) + goto put_map; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 40c3d85b06bae..24599749dc6f9 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5508,6 +5508,10 @@ static int bpf_map_do_batch(const union bpf_attr *attr, err = -EPERM; goto err_put; } + if (bpf_map_has_dynptr_key(map)) { + err = -EOPNOTSUPP; + goto err_put; + } if (cmd == BPF_MAP_LOOKUP_BATCH) BPF_DO_BATCH(map->ops->map_lookup_batch, map, attr, uattr); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 05a5636ae4984..fea94fcd8bf25 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10246,6 +10246,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) goto error; break; + case BPF_FUNC_for_each_map_elem: + if (bpf_map_has_dynptr_key(map)) + goto error; + break; default: break; } From patchwork Thu Mar 27 08:34:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030845 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ED1A205500 for ; Thu, 27 Mar 2025 08:23:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; cv=none; b=QNAGVUnQqdJSwiXszlJrbwz5LvB7E/GyUR4sxOEDFYhEZ4Yojqx6ieqhivtkmAhTtvrxdH1EBfHMDmMXpHMIK0SHk/xBivgV4gPejgKDUyDls0Y8SjHL1NYbbVOx9IWt1i1zZd/rlUUewXHMb5oySdC1RAtG2YJ6KSFBj+TSoOg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; c=relaxed/simple; bh=DsZUziRlSUFgWXMaNb+yMzDK1jBAmiiDwj49VMiU6vc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t0McBGK5Lx8hGjpBJ2637HY2bjwHXMmicVsS/paUD2j81ri8Hhq+FzPynvSUlEN027sG86J+ztnIm6faz/P3QH3iHutbKhRVemsv946egs56H1C9VFHcKV01R1ZsByHraCY3TD3HfR3AnsQxCvf4PtTX1o0kc5LhUkZ6+zZm3EY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8p0MsWz4f3m7L for ; Thu, 27 Mar 2025 16:22:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 7FA001A0ECD for ; Thu, 27 Mar 2025 16:22:58 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S17; Thu, 27 Mar 2025 16:22:58 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 13/16] bpf: Enable the creation of hash map with dynptr key Date: Thu, 27 Mar 2025 16:34:52 +0800 Message-Id: <20250327083455.848708-14-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S17 X-Coremail-Antispam: 1UD129KBjvdXoW7GFyDuFW5KF4UKr47Jr18Krg_yoWfurX_Cr 4YqF1rG398AFZ2qF15CrsxXry8Gw48Xr18Zw4DXF9FyFs8X34kJr4YvryrCF9xuw4UWFy5 Gas8Z39IqFnxZjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbgxYFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s 0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7 AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GF ylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x 07UZTmfUUUUU= X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The support for bpf_dynptr key in hash map is in place, therefore enable the creation of hash map with dynptr key in map_create(). Signed-off-by: Hou Tao --- kernel/bpf/syscall.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 24599749dc6f9..f5bda7fdb746f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1262,9 +1262,10 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, ret = -EACCES; goto free_map_tab; } - /* Enable for BPF_MAP_TYPE_HASH later */ - ret = -EOPNOTSUPP; - goto free_map_tab; + if (map->map_type != BPF_MAP_TYPE_HASH) { + ret = -EOPNOTSUPP; + goto free_map_tab; + } } else if (IS_ERR(map->key_record)) { /* Return an error early even the bpf program doesn't use it */ ret = PTR_ERR(map->key_record); From patchwork Thu Mar 27 08:34:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030846 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E95420550C for ; Thu, 27 Mar 2025 08:23:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; cv=none; b=UT2Ia3Ja6eVFidAQjpBaZ4LipAX09hwWcxGh34kpZxLCZXVbqim+VmtCp+8oJ0y1zu+UHGH2M7bEQ5d7ZFiasdCE/6lVtI5EPL50oNdeAnmLuUIJmhlkMTQUBvYo8L2SgcMBprAAdyUOHx4al6kcFaydU5aBDupbRBlnnuEAIr0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063783; c=relaxed/simple; bh=3Tq+ed5Zbr9kviCfTKcg4d6tfz28BG8aSvWjUR294ns=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jFSvIwGP87uUzMxtwNpjsodSolhJ4TV/9KFoGIHESUjTou+L6EX83Bxu3qhBdJUoQThSNUqkws+u+QMvvbJc/TfSmS0f2KdRhGh+2Ze0ACElI9MZ6OOyKwI7q5lKQEM893yDj8BwBDEbwcATghbAgt8GZACaNMZ00YmbLGz6BjU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8p4GtPz4f3lfp for ; Thu, 27 Mar 2025 16:22:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 11FAE1A1195 for ; Thu, 27 Mar 2025 16:22:59 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S18; Thu, 27 Mar 2025 16:22:58 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 14/16] selftests/bpf: Add bpf_dynptr_user_init() helper Date: Thu, 27 Mar 2025 16:34:53 +0800 Message-Id: <20250327083455.848708-15-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S18 X-Coremail-Antispam: 1UD129KBjvdXoW7Gw1kuw15WFykGr18Zry3twb_yoWkWFXE9F W0gF93Jr4DuF17Kr1jkrn8urZ5Cw45Gr48XrWDXry3Gr18Xa15XF4kCrs5Zas7W398Gay3 tF4kZry3Jr4UKjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbgxYFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s 0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7 AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GF ylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x 07UZTmfUUUUU= X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add bpf_dynptr_user_init() to initialize a bpf_dynptr object. It will be used by test_progs and bench. User can dereference the {data|size} fields directly to get the address and length of the dynptr object. Signed-off-by: Hou Tao --- tools/testing/selftests/bpf/testing_helpers.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/testing/selftests/bpf/testing_helpers.h b/tools/testing/selftests/bpf/testing_helpers.h index 46d7f7089f636..d93ee5bfa7f6f 100644 --- a/tools/testing/selftests/bpf/testing_helpers.h +++ b/tools/testing/selftests/bpf/testing_helpers.h @@ -58,4 +58,13 @@ int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt); int testing_prog_flags(void); bool is_jit_enabled(void); +/* sys_bpf() will check the validity of data and size */ +static inline void bpf_dynptr_user_init(void *data, __u32 size, + struct bpf_dynptr *dynptr) +{ + dynptr->data = data; + dynptr->size = size; + dynptr->reserved = 0; +} + #endif /* __TESTING_HELPERS_H */ From patchwork Thu Mar 27 08:34:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030849 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE6AE204F96 for ; Thu, 27 Mar 2025 08:23:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063785; cv=none; b=XZx+2mKvFrGTFzCq/0q4FrO8SoB2xOoSzJAaSix8578s1owigpPBf0tiWBrB8G8BtkW0TgCgEqP60kx1TN+gX39mOCcqqbeEuNA7sJWHKbJLrVfboDmQPAs88JljWtZSvomY2I2RoRh4JDuEvet8yWD9mnILGnBM2/P4rKOqUX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063785; c=relaxed/simple; bh=d2PZC4BT9MFbbsMr20DrWTdKflOL1SmXT7tDEsHdVs0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IWKDiOMD1uPh4ZOUElRjM3jvepCx4VHu6IDNpyAa3x4mi4ZbQ9oMw1d6YPlxu86jKW3MGaXF9YsmbQHlDFv4UgZSF5sSwcpFL5vEhEc5TcrpgW9YILXMphGvnMvfbPx+rVqLLuSMVAY5t8jI8nHJpanAjPjOxueUkLhLaaLHtRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8q1qYzz4f3m7Y for ; Thu, 27 Mar 2025 16:22:35 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A88B81A1D42 for ; Thu, 27 Mar 2025 16:22:59 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S19; Thu, 27 Mar 2025 16:22:59 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 15/16] selftests/bpf: Add test cases for hash map with dynptr key Date: Thu, 27 Mar 2025 16:34:54 +0800 Message-Id: <20250327083455.848708-16-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S19 X-Coremail-Antispam: 1UD129KBjvAXoWfurWrurW3CF1DWr1DZF48Crg_yoW5Cw4DXo ZxWrs0y3W8CF95Aw1DW3s7Ca1fXw48JryDCr4Sqws8Jw48KryYva4xGw45Gw42vw4rtFy8 uryfZw1fXrZ2gr15n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOY7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4 kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtV W8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67 AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZE Xa7IU1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add three positive test cases to test the basic operations on the dynptr-keyed hash map. The basic operations include lookup, update, delete and get_next_key. These operations are exercised both through bpf syscall and bpf program. These three test cases use different map keys. The first test case uses both bpf_dynptr and a struct with only bpf_dynptr as map key, the second one uses a struct with an integer and a bpf_dynptr as map key, and the last one use a struct with two bpf_dynptr as map key: one in the struct itself and another is nested in another struct. Also add multiple negative test cases for dynptr-keyed hash map. These test cases mainly check whether the layout of dynptr and non-dynptr in the stack is matched with the definition of map->key_record. Signed-off-by: Hou Tao --- .../bpf/prog_tests/htab_dynkey_test.c | 446 ++++++++++++++++++ .../bpf/progs/htab_dynkey_test_failure.c | 266 +++++++++++ .../bpf/progs/htab_dynkey_test_success.c | 382 +++++++++++++++ 3 files changed, 1094 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c diff --git a/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c b/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c new file mode 100644 index 0000000000000..77c3547b553c9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c @@ -0,0 +1,446 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +#include "htab_dynkey_test_success.skel.h" +#include "htab_dynkey_test_failure.skel.h" + +struct id_dname_key { + int id; + struct bpf_dynptr name; +}; + +struct dname_key { + struct bpf_dynptr name; +}; + +struct multiple_dynptr_key { + struct dname_key f_1; + unsigned long f_2; + struct id_dname_key f_3; + unsigned long f_4; +}; + +static char *name_list[] = { + "systemd", + "[rcu_sched]", + "[kworker/42:0H-events_highpri]", + "[ksoftirqd/58]", + "[rcu_tasks_trace]", +}; + +#define INIT_VALUE 100 +#define INIT_ID 1000 + +static void setup_pure_dynptr_key_map(int fd) +{ + struct bpf_dynptr key, _cur_key, _next_key; + struct bpf_dynptr *cur_key, *next_key; + bool marked[ARRAY_SIZE(name_list)]; + unsigned int i, next_idx, size; + unsigned long value, got; + char name[2][64]; + char msg[64]; + void *data; + int err; + + /* lookup non-existent keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u bad lookup", i); + /* Use strdup() to ensure that the content pointed by dynptr is + * used for lookup instead of the pointer in dynptr. sys_bpf() + * will handle the NULL case properly. + */ + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + err = bpf_map_lookup_elem(fd, &key, &value); + ASSERT_EQ(err, -ENOENT, msg); + free(data); + } + + /* update keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u insert", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + value = INIT_VALUE + i; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* lookup existent keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u lookup", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + got = 0; + err = bpf_map_lookup_elem(fd, &key, &got); + ASSERT_OK(err, msg); + free(data); + + value = INIT_VALUE + i; + ASSERT_EQ(got, value, msg); + } + + /* delete keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u delete", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + err = bpf_map_delete_elem(fd, &key); + ASSERT_OK(err, msg); + free(data); + } + + /* re-insert keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u re-insert", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + value = 0; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* overwrite keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u overwrite", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + value = INIT_VALUE + i; + err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* get_next keys */ + next_idx = 0; + cur_key = NULL; + next_key = &_next_key; + memset(&marked, 0, sizeof(marked)); + while (true) { + bpf_dynptr_user_init(name[next_idx], sizeof(name[next_idx]), next_key); + err = bpf_map_get_next_key(fd, cur_key, next_key); + if (err) { + ASSERT_EQ(err, -ENOENT, "get_next_key"); + break; + } + + size = next_key->size; + data = next_key->data; + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + if (size == strlen(name_list[i]) + 1 && + !memcmp(name_list[i], data, size)) { + ASSERT_FALSE(marked[i], name_list[i]); + marked[i] = true; + break; + } + } + ASSERT_EQ(next_key->reserved, 0, "reserved"); + + if (!cur_key) + cur_key = &_cur_key; + *cur_key = *next_key; + next_idx ^= 1; + } + + for (i = 0; i < ARRAY_SIZE(marked); i++) + ASSERT_TRUE(marked[i], name_list[i]); + + /* lookup_and_delete all elements except the first one */ + for (i = 1; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u lookup_delete", i); + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key); + got = 0; + err = bpf_map_lookup_and_delete_elem(fd, &key, &got); + ASSERT_OK(err, msg); + free(data); + + value = INIT_VALUE + i; + ASSERT_EQ(got, value, msg); + } + + /* get the key after the first element */ + cur_key = &_cur_key; + strncpy(name[0], name_list[0], sizeof(name[0]) - 1); + name[0][sizeof(name[0]) - 1] = 0; + bpf_dynptr_user_init(name[0], strlen(name[0]) + 1, cur_key); + + next_key = &_next_key; + bpf_dynptr_user_init(name[1], sizeof(name[1]), next_key); + err = bpf_map_get_next_key(fd, cur_key, next_key); + ASSERT_EQ(err, -ENOENT, "get_last"); +} + +static void setup_mixed_dynptr_key_map(int fd) +{ + struct id_dname_key key, _cur_key, _next_key; + struct id_dname_key *cur_key, *next_key; + bool marked[ARRAY_SIZE(name_list)]; + unsigned int i, next_idx, size; + unsigned long value; + char name[2][64]; + char msg[64]; + void *data; + int err; + + /* Zero the hole */ + memset(&key, 0, sizeof(key)); + + /* lookup non-existent keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u bad lookup", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + err = bpf_map_lookup_elem(fd, &key, &value); + ASSERT_EQ(err, -ENOENT, msg); + free(data); + } + + /* update keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u insert", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + value = INIT_VALUE + i; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* lookup existent keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + unsigned long got = 0; + + snprintf(msg, sizeof(msg), "#%u lookup", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + err = bpf_map_lookup_elem(fd, &key, &got); + ASSERT_OK(err, msg); + free(data); + + value = INIT_VALUE + i; + ASSERT_EQ(got, value, msg); + } + + /* delete keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u delete", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + err = bpf_map_delete_elem(fd, &key); + ASSERT_OK(err, msg); + free(data); + } + + /* re-insert keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u re-insert", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + value = 0; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* overwrite keys */ + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + snprintf(msg, sizeof(msg), "#%u overwrite", i); + key.id = INIT_ID + i; + data = strdup(name_list[i]); + bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name); + value = INIT_VALUE + i; + err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST); + ASSERT_OK(err, msg); + free(data); + } + + /* get_next keys */ + next_idx = 0; + cur_key = NULL; + next_key = &_next_key; + memset(&marked, 0, sizeof(marked)); + while (true) { + bpf_dynptr_user_init(name[next_idx], sizeof(name[next_idx]), &next_key->name); + err = bpf_map_get_next_key(fd, cur_key, next_key); + if (err) { + ASSERT_EQ(err, -ENOENT, "last get_next"); + break; + } + + size = next_key->name.size; + data = next_key->name.data; + for (i = 0; i < ARRAY_SIZE(name_list); i++) { + if (size == strlen(name_list[i]) + 1 && + !memcmp(name_list[i], data, size)) { + ASSERT_FALSE(marked[i], name_list[i]); + ASSERT_EQ(next_key->id, INIT_ID + i, name_list[i]); + marked[i] = true; + break; + } + } + ASSERT_EQ(next_key->name.reserved, 0, "reserved"); + + if (!cur_key) + cur_key = &_cur_key; + *cur_key = *next_key; + next_idx ^= 1; + } + + for (i = 0; i < ARRAY_SIZE(marked); i++) + ASSERT_TRUE(marked[i], name_list[i]); +} + +static void setup_multiple_dynptr_key_map(int fd) +{ + struct multiple_dynptr_key key, cur_key, next_key; + unsigned long value; + unsigned int size; + char name[4][64]; + void *data[2]; + int err; + + /* Zero the hole */ + memset(&key, 0, sizeof(key)); + + key.f_2 = 2; + key.f_3.id = 3; + key.f_4 = 4; + + /* lookup a non-existent key */ + data[0] = strdup(name_list[0]); + data[1] = strdup(name_list[1]); + bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name); + bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name); + err = bpf_map_lookup_elem(fd, &key, &value); + ASSERT_EQ(err, -ENOENT, "lookup"); + + /* update key */ + value = INIT_VALUE; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, "update"); + free(data[0]); + free(data[1]); + + /* lookup key */ + data[0] = strdup(name_list[0]); + data[1] = strdup(name_list[1]); + bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name); + bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name); + err = bpf_map_lookup_elem(fd, &key, &value); + ASSERT_OK(err, "lookup"); + ASSERT_EQ(value, INIT_VALUE, "lookup"); + + /* delete key */ + err = bpf_map_delete_elem(fd, &key); + ASSERT_OK(err, "delete"); + free(data[0]); + free(data[1]); + + /* re-insert keys */ + bpf_dynptr_user_init(name_list[0], strlen(name_list[0]) + 1, &key.f_1.name); + bpf_dynptr_user_init(name_list[1], strlen(name_list[1]) + 1, &key.f_3.name); + value = 0; + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + ASSERT_OK(err, "re-insert"); + + /* overwrite keys */ + data[0] = strdup(name_list[0]); + data[1] = strdup(name_list[1]); + bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name); + bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name); + value = INIT_VALUE; + err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST); + ASSERT_OK(err, "overwrite"); + free(data[0]); + free(data[1]); + + /* get_next_key */ + bpf_dynptr_user_init(name[0], sizeof(name[0]), &next_key.f_1.name); + bpf_dynptr_user_init(name[1], sizeof(name[1]), &next_key.f_3.name); + err = bpf_map_get_next_key(fd, NULL, &next_key); + ASSERT_OK(err, "first get_next"); + + size = next_key.f_1.name.size; + data[0] = next_key.f_1.name.data; + if (ASSERT_EQ(size, strlen(name_list[0]) + 1, "f_1 size")) + ASSERT_TRUE(!memcmp(name_list[0], data[0], size), "f_1 data"); + ASSERT_EQ(next_key.f_1.name.reserved, 0, "f_1 reserved"); + + ASSERT_EQ(next_key.f_2, 2, "f_2"); + + ASSERT_EQ(next_key.f_3.id, 3, "f_3 id"); + size = next_key.f_3.name.size; + data[0] = next_key.f_3.name.data; + if (ASSERT_EQ(size, strlen(name_list[1]) + 1, "f_3 size")) + ASSERT_TRUE(!memcmp(name_list[1], data[0], size), "f_3 data"); + ASSERT_EQ(next_key.f_3.name.reserved, 0, "f_3 reserved"); + + ASSERT_EQ(next_key.f_4, 4, "f_4"); + + cur_key = next_key; + bpf_dynptr_user_init(name[2], sizeof(name[2]), &next_key.f_1.name); + bpf_dynptr_user_init(name[3], sizeof(name[3]), &next_key.f_3.name); + err = bpf_map_get_next_key(fd, &cur_key, &next_key); + ASSERT_EQ(err, -ENOENT, "last get_next_key"); +} + +static void test_htab_dynptr_key(bool pure, bool multiple) +{ + struct htab_dynkey_test_success *skel; + LIBBPF_OPTS(bpf_test_run_opts, opts); + struct bpf_program *prog; + int err; + + skel = htab_dynkey_test_success__open(); + if (!ASSERT_OK_PTR(skel, "open()")) + return; + + prog = pure ? skel->progs.pure_dynptr_key : + (multiple ? skel->progs.multiple_dynptr_key : skel->progs.mixed_dynptr_key); + bpf_program__set_autoload(prog, true); + + err = htab_dynkey_test_success__load(skel); + if (!ASSERT_OK(err, "load()")) + goto out; + + if (pure) { + setup_pure_dynptr_key_map(bpf_map__fd(skel->maps.htab_1)); + setup_pure_dynptr_key_map(bpf_map__fd(skel->maps.htab_2)); + } else if (multiple) { + setup_multiple_dynptr_key_map(bpf_map__fd(skel->maps.htab_4)); + } else { + setup_mixed_dynptr_key_map(bpf_map__fd(skel->maps.htab_3)); + } + + err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts); + ASSERT_OK(err, "run"); + ASSERT_EQ(opts.retval, 0, "retval"); +out: + htab_dynkey_test_success__destroy(skel); +} + +void test_htab_dynkey_test(void) +{ + if (test__start_subtest("pure_dynptr_key")) + test_htab_dynptr_key(true, false); + if (test__start_subtest("mixed_dynptr_key")) + test_htab_dynptr_key(false, false); + if (test__start_subtest("multiple_dynptr_key")) + test_htab_dynptr_key(false, true); + + RUN_TESTS(htab_dynkey_test_failure); +} diff --git a/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c b/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c new file mode 100644 index 0000000000000..2577f2a2fe309 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c @@ -0,0 +1,266 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct bpf_map; + +struct id_dname_key { + int id; + struct bpf_dynptr name; +}; + +struct dname_id_key { + struct bpf_dynptr name; + int id; +}; + +struct id_name_key { + int id; + char name[20]; +}; + +struct dname_key { + struct bpf_dynptr name; +}; + +struct dname_dname_key { + struct bpf_dynptr name_1; + struct bpf_dynptr name_2; +}; + +struct dname_dname_id_key { + struct dname_dname_key names; + __u64 id; +}; + +struct dname_id_id_id_key { + struct bpf_dynptr name; + __u64 id[3]; +}; + +struct dname_dname_dname_key { + struct bpf_dynptr name_1; + struct bpf_dynptr name_2; + struct bpf_dynptr name_3; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct id_dname_key); + __type(value, unsigned long); +} htab_1 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct dname_key); + __type(value, unsigned long); +} htab_2 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct dname_dname_id_key); + __type(value, unsigned long); +} htab_3 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct bpf_dynptr); + __type(value, unsigned long); +} htab_4 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 4096); +} ringbuf SEC(".maps"); + +char dynptr_buf[32] = {}; + +/* uninitialized dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects dynptr at offset 8") +int BPF_PROG(uninit_dynptr) +{ + struct id_dname_key key; + + key.id = 100; + bpf_map_lookup_elem(&htab_1, &key); + + return 0; +} + +/* invalid dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects dynptr at offset 8") +int BPF_PROG(invalid_dynptr) +{ + struct id_dname_key key; + + key.id = 100; + bpf_ringbuf_reserve_dynptr(&ringbuf, 10, 0, &key.name); + bpf_ringbuf_discard_dynptr(&key.name, 0); + bpf_map_lookup_elem(&htab_1, &key); + + return 0; +} + +/* expect no-dynptr got dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects non-dynptr at offset 0") +int BPF_PROG(invalid_non_dynptr) +{ + struct dname_id_key key; + + __builtin_memcpy(dynptr_buf, "test", 4); + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name); + key.id = 100; + bpf_map_lookup_elem(&htab_1, &key); + + return 0; +} + +/* expect dynptr get non-dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects dynptr at offset 8") +int BPF_PROG(no_dynptr) +{ + struct id_name_key key; + + key.id = 100; + __builtin_memset(key.name, 0, sizeof(key.name)); + __builtin_memcpy(key.name, "test", 4); + bpf_map_lookup_elem(&htab_1, &key); + + return 0; +} + +/* malformed */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("malformed dynptr-key at offset 8") +int BPF_PROG(malformed_dynptr) +{ + struct dname_dname_key key; + + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_1); + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_2); + + bpf_map_lookup_elem(&htab_2, (void *)&key + 8); + + return 0; +} + +/* expect no-dynptr got dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects non-dynptr at offset 32") +int BPF_PROG(invalid_non_dynptr_2) +{ + struct dname_dname_dname_key key; + + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_1); + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_2); + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_3); + + bpf_map_lookup_elem(&htab_3, &key); + + return 0; +} + +/* expect dynptr get non-dynptr */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("dynptr-key expects dynptr at offset 16") +int BPF_PROG(no_dynptr_2) +{ + struct dname_id_id_id_key key; + + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name); + bpf_map_lookup_elem(&htab_3, &key); + + return 0; +} + +/* misaligned */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("R2 misaligned offset -28 for dynptr-key") +int BPF_PROG(misaligned_dynptr) +{ + struct dname_dname_key key = {}; + + bpf_map_lookup_elem(&htab_1, (char *)&key + 4); + + return 0; +} + +/* variable offset */ +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("R2 variable offset prohibited for dynptr-key") +int BPF_PROG(variable_offset_dynptr) +{ + struct bpf_dynptr dynptr_1; + struct bpf_dynptr dynptr_2; + char *key; + + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr_1); + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr_2); + + key = (char *)&dynptr_2; + key = key + (bpf_get_prandom_u32() & 1) * 16; + + bpf_map_lookup_elem(&htab_2, key); + + return 0; +} + +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("map dynptr-key requires stack ptr but got map_value") +int BPF_PROG(map_value_as_key) +{ + bpf_map_lookup_elem(&htab_1, dynptr_buf); + + return 0; +} + +static int lookup_htab(struct bpf_map *map, struct id_dname_key *key, void *value, void *data) +{ + bpf_map_lookup_elem(&htab_1, key); + return 0; +} + +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("cannot pass map_type 1 into func bpf_for_each_map_elem") +int BPF_PROG(map_key_as_key) +{ + bpf_for_each_map_elem(&htab_1, lookup_htab, NULL, 0); + return 0; +} + +__noinline __weak int subprog_lookup_htab(struct bpf_dynptr *dynptr) +{ + bpf_map_lookup_elem(&htab_4, dynptr); + return 0; +} + +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +__failure __msg("R2 type=dynptr_ptr expected=") +int BPF_PROG(subprog_dynptr) +{ + struct bpf_dynptr dynptr; + + bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr); + subprog_lookup_htab(&dynptr); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c new file mode 100644 index 0000000000000..84e6931cc19c0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c @@ -0,0 +1,382 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +struct pure_dynptr_key { + struct bpf_dynptr name; +}; + +struct mixed_dynptr_key { + int id; + struct bpf_dynptr name; +}; + +struct multiple_dynptr_key { + struct pure_dynptr_key f_1; + unsigned long f_2; + struct mixed_dynptr_key f_3; + unsigned long f_4; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct bpf_dynptr); + __type(value, unsigned long); +} htab_1 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct pure_dynptr_key); + __type(value, unsigned long); +} htab_2 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct mixed_dynptr_key); + __type(value, unsigned long); +} htab_3 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, struct multiple_dynptr_key); + __type(value, unsigned long); +} htab_4 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 4096); +} ringbuf SEC(".maps"); + +char dynptr_buf[2][32] = {{}, {}}; + +static const char systemd_name[] = "systemd"; +static const char udevd_name[] = "udevd"; +static const char rcu_sched_name[] = "[rcu_sched]"; + +struct bpf_map; + +static int test_pure_dynptr_key_htab(struct bpf_map *htab) +{ + unsigned long new_value, *value; + struct bpf_dynptr key; + int err = 0; + + /* Lookup a existent key */ + __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name)); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key); + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 1; + goto out; + } + if (*value != 100) { + err = 2; + goto out; + } + + /* Look up a non-existent key */ + __builtin_memcpy(dynptr_buf[0], udevd_name, sizeof(udevd_name)); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key); + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 3; + goto out; + } + + /* Insert a new key */ + new_value = 42; + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + if (err) { + err = 4; + goto out; + } + + /* Insert an existent key */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key); + err = bpf_dynptr_write(&key, 0, (void *)udevd_name, sizeof(udevd_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key, 0); + err = 5; + goto out; + } + + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + bpf_ringbuf_discard_dynptr(&key, 0); + if (err != -EEXIST) { + err = 6; + goto out; + } + + /* Lookup it again */ + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key); + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 7; + goto out; + } + if (*value != 42) { + err = 8; + goto out; + } + + /* Delete then lookup it */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key); + err = bpf_dynptr_write(&key, 0, (void *)udevd_name, sizeof(udevd_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key, 0); + err = 9; + goto out; + } + err = bpf_map_delete_elem(htab, &key); + bpf_ringbuf_discard_dynptr(&key, 0); + if (err) { + err = 10; + goto out; + } + + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key); + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 10; + goto out; + } +out: + return err; +} + +static int test_mixed_dynptr_key_htab(struct bpf_map *htab) +{ + unsigned long new_value, *value; + char udevd_name[] = "udevd"; + struct mixed_dynptr_key key; + int err = 0; + + __builtin_memset(&key, 0, sizeof(key)); + key.id = 1000; + + /* Lookup a existent key */ + __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name)); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.name); + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 1; + goto out; + } + if (*value != 100) { + err = 2; + goto out; + } + + /* Look up a non-existent key */ + __builtin_memcpy(dynptr_buf[0], udevd_name, sizeof(udevd_name)); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name); + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 3; + goto out; + } + + /* Insert a new key */ + new_value = 42; + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + if (err) { + err = 4; + goto out; + } + + /* Insert an existent key */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key.name); + err = bpf_dynptr_write(&key.name, 0, (void *)udevd_name, sizeof(udevd_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key.name, 0); + err = 5; + goto out; + } + + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + bpf_ringbuf_discard_dynptr(&key.name, 0); + if (err != -EEXIST) { + err = 6; + goto out; + } + + /* Lookup it again */ + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name); + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 7; + goto out; + } + if (*value != 42) { + err = 8; + goto out; + } + + /* Delete then lookup it */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key.name); + err = bpf_dynptr_write(&key.name, 0, (void *)udevd_name, sizeof(udevd_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key.name, 0); + err = 9; + goto out; + } + err = bpf_map_delete_elem(htab, &key); + bpf_ringbuf_discard_dynptr(&key.name, 0); + if (err) { + err = 10; + goto out; + } + + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name); + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 10; + goto out; + } +out: + return err; +} + +static int test_multiple_dynptr_key_htab(struct bpf_map *htab) +{ + unsigned long new_value, *value; + struct multiple_dynptr_key key; + int err = 0; + + __builtin_memset(&key, 0, sizeof(key)); + key.f_2 = 2; + key.f_3.id = 3; + key.f_4 = 4; + + /* Lookup a existent key */ + __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name)); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.f_1.name); + __builtin_memcpy(dynptr_buf[1], rcu_sched_name, sizeof(rcu_sched_name)); + bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_3.name); + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 1; + goto out; + } + if (*value != 100) { + err = 2; + goto out; + } + + /* Look up a non-existent key */ + bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_1.name); + bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.f_3.name); + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 3; + goto out; + } + + /* Insert a new key */ + new_value = 42; + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + if (err) { + err = 4; + goto out; + } + + /* Insert an existent key */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(rcu_sched_name), 0, &key.f_1.name); + err = bpf_dynptr_write(&key.f_1.name, 0, (void *)rcu_sched_name, sizeof(rcu_sched_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key.f_1.name, 0); + err = 5; + goto out; + } + err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST); + bpf_ringbuf_discard_dynptr(&key.f_1.name, 0); + if (err != -EEXIST) { + err = 6; + goto out; + } + + /* Lookup a non-existent key */ + bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_1.name); + key.f_4 = 0; + value = bpf_map_lookup_elem(htab, &key); + if (value) { + err = 7; + goto out; + } + + /* Lookup an existent key */ + key.f_4 = 4; + value = bpf_map_lookup_elem(htab, &key); + if (!value) { + err = 8; + goto out; + } + if (*value != 42) { + err = 9; + goto out; + } + + /* Delete the newly-inserted key */ + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(systemd_name), 0, &key.f_3.name); + err = bpf_dynptr_write(&key.f_3.name, 0, (void *)systemd_name, sizeof(systemd_name), 0); + if (err) { + bpf_ringbuf_discard_dynptr(&key.f_3.name, 0); + err = 10; + goto out; + } + err = bpf_map_delete_elem(htab, &key); + if (err) { + bpf_ringbuf_discard_dynptr(&key.f_3.name, 0); + err = 11; + goto out; + } + + /* Lookup it again */ + value = bpf_map_lookup_elem(htab, &key); + bpf_ringbuf_discard_dynptr(&key.f_3.name, 0); + if (value) { + err = 12; + goto out; + } +out: + return err; +} + +SEC("?raw_tp") +int BPF_PROG(pure_dynptr_key) +{ + int err; + + err = test_pure_dynptr_key_htab((struct bpf_map *)&htab_1); + err |= test_pure_dynptr_key_htab((struct bpf_map *)&htab_2) << 8; + + return err; +} + +SEC("?raw_tp") +int BPF_PROG(mixed_dynptr_key) +{ + return test_mixed_dynptr_key_htab((struct bpf_map *)&htab_3); +} + +SEC("?raw_tp") +int BPF_PROG(multiple_dynptr_key) +{ + return test_multiple_dynptr_key_htab((struct bpf_map *)&htab_4); +} From patchwork Thu Mar 27 08:34:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 14030850 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9004A20551B for ; Thu, 27 Mar 2025 08:23:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063786; cv=none; b=sfOM8C6//VONGn5kErwgS1u74J9VLwGWaRqb8Am8Zcw7DOQ9W94fDi+EBLClbUomKByA55HHEztClAijw3qjUWbjMBWxvueGiSNegOtEn1wJZbVuh2mqaJFfDkvOe/xBDamtWbRhYuz5VQI4t7IXub6I6nSiUCcUUMmdTIlTm8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743063786; c=relaxed/simple; bh=jsHcPvzAgEZzjh5ScCzVPZ0vTANFXgVeGakCQ0LWmzU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=Nu4lGvQaH86nm1eb+fU0ykqzaKAJZBHzEITdCX4BPzQi1y2/kGdrrY5P4jC4zmJkH5aN4OpNmQ+90WVFT7dl7gEdnGo7BGhDIvlmsRRtPcxm+S3QYd2fV1GmBAVvcLgi9WRHHz+vI2DtCKITe8pT8kqk6Z3gYRLgxPC7rWuFGPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4ZNc8s2FVyz4f3jdF for ; Thu, 27 Mar 2025 16:22:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 42F1A1A0A1D for ; Thu, 27 Mar 2025 16:23:00 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAXe1_XCuVnluzSHg--.7420S20; Thu, 27 Mar 2025 16:22:59 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , houtao1@huawei.com Subject: [PATCH bpf-next v3 16/16] selftests/bpf: Add benchmark for dynptr key support in hash map Date: Thu, 27 Mar 2025 16:34:55 +0800 Message-Id: <20250327083455.848708-17-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250327083455.848708-1-houtao@huaweicloud.com> References: <20250327083455.848708-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAXe1_XCuVnluzSHg--.7420S20 X-Coremail-Antispam: 1UD129KBjvAXoWfAFWxtw4kKFW8Xw1DuF45ZFb_yoW5urW3Co WfWFsxA3yrCr1UA3s8Gw1kC3Z3Z3yDGa47J39YvwnxXFyUtw43urykCw4fCw42vrW5tw17 ZFZ8t34Sq3yjgFn5n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOY7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4 kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtV W8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67 AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZE Xa7IU1aLvJUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The patch adds a benchmark test to compare the lookup and update/delete performance between normal hash map and dynptr-keyed hash map. It also compares the memory usage of these two maps after fill up these two maps. The benchmark simulates the case when the map key is composed of a 8-bytes integer and a variable-size string. Now the integer just saves the length of the string. These strings will be randomly generated by default, and they can also be specified by a external file (e.g., the output from awk '{print $3}' /proc/kallsyms). The key definitions for dynptr-keyed and normal hash map are defined as shown below: struct dynptr_key { __u64 cookie; struct bpf_dynptr desc; } struct norm_key { __u64 cookie; char desc[MAX_STR_SIZE]; }; The lookup or update procedure will first lookup an array to get the key of hash map. The returned value from the array is the same as norm_key definition. For normal hash map, it will use the returned value to manipulate the hash map directly. For dynptr-keyed hash map, it will construct a bpf_dynptr object from the returned value (the value of cookie is the same as the string length), then passes the key to dynptr-keyed hash map. Because the lookup procedure is lockless, therefore, each producer during lookup test will lookup the whole hash map. However, update and deletion procedures have lock, therefore, each producer during update test only updates different part of the hash map. The following is the benchmark results when running the benchmark under a 8-CPUs VM: (1) Randomly generate 128K strings (max_size=256, entries=128K) ENTRIES=131072 ./benchs/run_bench_dynptr_key.sh normal hash map =============== htab-lookup-p1-131072 2.779 ± 0.091M/s (drops 0.006 ± 0.000M/s, mem 64.984 MiB) htab-lookup-p2-131072 5.504 ± 0.060M/s (drops 0.013 ± 0.000M/s, mem 64.966 MiB) htab-lookup-p4-131072 10.791 ± 0.054M/s (drops 0.025 ± 0.000M/s, mem 64.984 MiB) htab-lookup-p8-131072 20.947 ± 0.053M/s (drops 0.046 ± 0.000M/s, mem 64.984 MiB) htab-neg-lookup-p1-131072 0.000 ± 0.000M/s (drops 3.464 ± 0.088M/s, mem 64.984 MiB) htab-neg-lookup-p2-131072 0.000 ± 0.000M/s (drops 6.731 ± 0.063M/s, mem 64.984 MiB) htab-neg-lookup-p4-131072 0.000 ± 0.000M/s (drops 13.380 ± 0.174M/s, mem 64.984 MiB) htab-neg-lookup-p8-131072 0.000 ± 0.000M/s (drops 26.433 ± 0.398M/s, mem 64.966 MiB) htab-update-p1-131072 1.988 ± 0.025M/s (drops 0.000 ± 0.000M/s, mem 64.968 MiB) htab-update-p2-131072 3.225 ± 0.032M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) htab-update-p4-131072 6.982 ± 0.084M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) htab-update-p8-131072 13.103 ± 0.180M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) dynptr-keyed hash map ===================== htab-lookup-p1-131072 3.477 ± 0.045M/s (drops 0.007 ± 0.000M/s, mem 34.905 MiB) htab-lookup-p2-131072 6.798 ± 0.056M/s (drops 0.015 ± 0.000M/s, mem 34.969 MiB) htab-lookup-p4-131072 13.600 ± 0.161M/s (drops 0.029 ± 0.000M/s, mem 34.911 MiB) htab-lookup-p8-131072 25.831 ± 0.559M/s (drops 0.055 ± 0.001M/s, mem 34.907 MiB) htab-neg-lookup-p1-131072 0.000 ± 0.000M/s (drops 4.748 ± 0.067M/s, mem 34.957 MiB) htab-neg-lookup-p2-131072 0.000 ± 0.000M/s (drops 9.420 ± 0.049M/s, mem 34.975 MiB) htab-neg-lookup-p4-131072 0.000 ± 0.000M/s (drops 19.186 ± 0.326M/s, mem 34.953 MiB) htab-neg-lookup-p8-131072 0.000 ± 0.000M/s (drops 38.055 ± 0.613M/s, mem 34.995 MiB) htab-update-p1-131072 2.072 ± 0.046M/s (drops 0.000 ± 0.000M/s, mem 34.950 MiB) htab-update-p2-131072 2.935 ± 0.066M/s (drops 0.000 ± 0.000M/s, mem 34.946 MiB) htab-update-p4-131072 6.371 ± 0.113M/s (drops 0.000 ± 0.000M/s, mem 34.949 MiB) htab-update-p8-131072 11.646 ± 0.330M/s (drops 0.000 ± 0.000M/s, mem 34.924 MiB) (2) Use strings in /proc/kallsyms (max_size=82, entries=150K) STR_FILE=kallsyms.txt ./benchs/run_bench_dynptr_key.sh normal hash map =============== htab-lookup-p1-kallsyms.txt 6.508 ± 0.186M/s (drops 0.000 ± 0.000M/s, mem 31.026 MiB) htab-lookup-p2-kallsyms.txt 13.381 ± 0.270M/s (drops 0.000 ± 0.000M/s, mem 31.026 MiB) htab-lookup-p4-kallsyms.txt 26.838 ± 0.465M/s (drops 0.000 ± 0.000M/s, mem 31.026 MiB) htab-lookup-p8-kallsyms.txt 51.290 ± 0.880M/s (drops 0.000 ± 0.000M/s, mem 31.026 MiB) htab-neg-lookup-p1-kallsyms.txt 0.000 ± 0.000M/s (drops 7.771 ± 0.242M/s, mem 31.026 MiB) htab-neg-lookup-p2-kallsyms.txt 0.000 ± 0.000M/s (drops 15.626 ± 0.155M/s, mem 31.026 MiB) htab-neg-lookup-p4-kallsyms.txt 0.000 ± 0.000M/s (drops 31.766 ± 0.442M/s, mem 31.026 MiB) htab-neg-lookup-p8-kallsyms.txt 0.000 ± 0.000M/s (drops 63.766 ± 1.379M/s, mem 31.026 MiB) htab-update-p1-kallsyms.txt 3.240 ± 0.152M/s (drops 0.000 ± 0.000M/s, mem 31.028 MiB) htab-update-p2-kallsyms.txt 5.268 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 31.028 MiB) htab-update-p4-kallsyms.txt 11.192 ± 0.201M/s (drops 0.000 ± 0.000M/s, mem 31.028 MiB) htab-update-p8-kallsyms.txt 21.098 ± 0.179M/s (drops 0.000 ± 0.000M/s, mem 31.028 MiB) dynptr-keyed hash map ===================== htab-lookup-p1-kallsyms.txt 6.366 ± 0.247M/s (drops 0.000 ± 0.000M/s, mem 24.572 MiB) htab-lookup-p2-kallsyms.txt 12.477 ± 0.223M/s (drops 0.000 ± 0.000M/s, mem 24.572 MiB) htab-lookup-p4-kallsyms.txt 25.797 ± 0.593M/s (drops 0.000 ± 0.000M/s, mem 24.572 MiB) htab-lookup-p8-kallsyms.txt 51.070 ± 1.471M/s (drops 0.000 ± 0.000M/s, mem 24.572 MiB) htab-neg-lookup-p1-kallsyms.txt 0.000 ± 0.000M/s (drops 7.600 ± 0.183M/s, mem 24.572 MiB) htab-neg-lookup-p2-kallsyms.txt 0.000 ± 0.000M/s (drops 15.182 ± 0.193M/s, mem 24.572 MiB) htab-neg-lookup-p4-kallsyms.txt 0.000 ± 0.000M/s (drops 30.680 ± 0.496M/s, mem 24.572 MiB) htab-neg-lookup-p8-kallsyms.txt 0.000 ± 0.000M/s (drops 60.880 ± 0.754M/s, mem 24.572 MiB) htab-update-p1-kallsyms.txt 2.868 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 24.574 MiB) htab-update-p2-kallsyms.txt 4.357 ± 0.039M/s (drops 0.000 ± 0.000M/s, mem 24.574 MiB) htab-update-p4-kallsyms.txt 9.149 ± 0.131M/s (drops 0.000 ± 0.000M/s, mem 24.574 MiB) htab-update-p8-kallsyms.txt 16.804 ± 0.425M/s (drops 0.000 ± 0.000M/s, mem 24.574 MiB) Signed-off-by: Hou Tao --- tools/testing/selftests/bpf/Makefile | 2 + tools/testing/selftests/bpf/bench.c | 14 + .../selftests/bpf/benchs/bench_dynptr_key.c | 648 ++++++++++++++++++ .../bpf/benchs/run_bench_dynptr_key.sh | 51 ++ .../selftests/bpf/progs/dynptr_key_bench.c | 249 +++++++ 5 files changed, 964 insertions(+) create mode 100644 tools/testing/selftests/bpf/benchs/bench_dynptr_key.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_dynptr_key.sh create mode 100644 tools/testing/selftests/bpf/progs/dynptr_key_bench.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index c0a8207a50d94..4a212845dce25 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -812,6 +812,7 @@ $(OUTPUT)/bench_local_storage_create.o: $(OUTPUT)/bench_local_storage_create.ske $(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h $(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h $(OUTPUT)/bench_bpf_crypto.o: $(OUTPUT)/crypto_bench.skel.h +$(OUTPUT)/bench_dynptr_key.o: $(OUTPUT)/dynptr_key_bench.skel.h $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) $(OUTPUT)/bench: LDLIBS += -lm $(OUTPUT)/bench: $(OUTPUT)/bench.o \ @@ -832,6 +833,7 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(OUTPUT)/bench_local_storage_create.o \ $(OUTPUT)/bench_htab_mem.o \ $(OUTPUT)/bench_bpf_crypto.o \ + $(OUTPUT)/bench_dynptr_key.o \ # $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index 1bd403a5ef7b3..a47077154ff0f 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -283,6 +283,7 @@ extern struct argp bench_local_storage_create_argp; extern struct argp bench_htab_mem_argp; extern struct argp bench_trigger_batch_argp; extern struct argp bench_crypto_argp; +extern struct argp bench_dynptr_key_argp; static const struct argp_child bench_parsers[] = { { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, @@ -297,6 +298,7 @@ static const struct argp_child bench_parsers[] = { { &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 }, { &bench_trigger_batch_argp, 0, "BPF triggering benchmark", 0 }, { &bench_crypto_argp, 0, "bpf crypto benchmark", 0 }, + { &bench_dynptr_key_argp, 0, "dynptr key benchmark", 0 }, {}, }; @@ -549,6 +551,12 @@ extern const struct bench bench_local_storage_create; extern const struct bench bench_htab_mem; extern const struct bench bench_crypto_encrypt; extern const struct bench bench_crypto_decrypt; +extern const struct bench bench_norm_htab_lookup; +extern const struct bench bench_dynkey_htab_lookup; +extern const struct bench bench_norm_htab_neg_lookup; +extern const struct bench bench_dynkey_htab_neg_lookup; +extern const struct bench bench_norm_htab_update; +extern const struct bench bench_dynkey_htab_update; static const struct bench *benchs[] = { &bench_count_global, @@ -609,6 +617,12 @@ static const struct bench *benchs[] = { &bench_htab_mem, &bench_crypto_encrypt, &bench_crypto_decrypt, + &bench_norm_htab_lookup, + &bench_dynkey_htab_lookup, + &bench_norm_htab_neg_lookup, + &bench_dynkey_htab_neg_lookup, + &bench_norm_htab_update, + &bench_dynkey_htab_update, }; static void find_benchmark(void) diff --git a/tools/testing/selftests/bpf/benchs/bench_dynptr_key.c b/tools/testing/selftests/bpf/benchs/bench_dynptr_key.c new file mode 100644 index 0000000000000..4dd83c52a4d11 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/bench_dynptr_key.c @@ -0,0 +1,648 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include "bench.h" +#include "bpf_util.h" +#include "cgroup_helpers.h" +#include "testing_helpers.h" + +#include "dynptr_key_bench.skel.h" + +enum { + NORM_HTAB = 0, + DYNPTR_KEY_HTAB, +}; + +static struct dynptr_key_ctx { + struct dynptr_key_bench *skel; + int cgrp_dfd; + u64 map_slab_mem; +} ctx; + +static struct { + const char *file; + __u32 entries; + __u32 max_size; +} args = { + .max_size = 256, +}; + +struct run_stat { + __u64 stats[2]; +}; + +struct dynkey_key { + /* prevent unnecessary hole */ + __u64 cookie; + struct bpf_dynptr desc; +}; + +struct var_size_str { + /* the same size as cookie */ + __u64 len; + unsigned char data[]; +}; + +enum { + ARG_DATA_FILE = 11001, + ARG_DATA_ENTRIES = 11002, + ARG_MAX_SIZE = 11003, +}; + +static const struct argp_option opts[] = { + { "file", ARG_DATA_FILE, "DATA-FILE", 0, "Set data file" }, + { "entries", ARG_DATA_ENTRIES, "DATA-ENTRIES", 0, "Set data entries" }, + { "max_size", ARG_MAX_SIZE, "MAX-SIZE", 0, "Set data max size" }, + {}, +}; + +static error_t dynptr_key_parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case ARG_DATA_FILE: + args.file = strdup(arg); + if (!args.file) { + fprintf(stderr, "no mem for file name\n"); + argp_usage(state); + } + break; + case ARG_DATA_ENTRIES: + args.entries = strtoul(arg, NULL, 10); + break; + case ARG_MAX_SIZE: + args.max_size = strtoul(arg, NULL, 10); + break; + default: + return ARGP_ERR_UNKNOWN; + } + + return 0; +} + +const struct argp bench_dynptr_key_argp = { + .options = opts, + .parser = dynptr_key_parse_arg, +}; + +static int count_nr_item(const char *name, char *buf, size_t size, unsigned int *nr_items) +{ + unsigned int i = 0; + FILE *file; + int err; + + file = fopen(name, "rb"); + if (!file) { + fprintf(stderr, "open %s err %s\n", name, strerror(errno)); + return -1; + } + + err = 0; + while (true) { + unsigned int len; + char *got; + + got = fgets(buf, size, file); + if (!got) { + if (!feof(file)) { + fprintf(stderr, "read file %s error\n", name); + err = -1; + } + break; + } + + len = strlen(got); + if (len && got[len - 1] == '\n') { + got[len - 1] = 0; + len -= 1; + } + if (!len) + continue; + i++; + } + fclose(file); + + if (!err) + *nr_items = i; + + return err; +} + +static int parse_data_set(const char *name, struct var_size_str ***set, unsigned int *nr, + unsigned int *max_len) +{ +#define FILE_DATA_MAX_SIZE 4096 + unsigned int i, nr_items, item_max_len; + char line[FILE_DATA_MAX_SIZE + 1]; + struct var_size_str **items; + struct var_size_str *cur; + int err = 0; + FILE *file; + char *got; + + if (count_nr_item(name, line, sizeof(line), &nr_items)) + return -1; + if (!nr_items) { + fprintf(stderr, "empty file ?\n"); + return -1; + } + fprintf(stdout, "%u items in %s\n", nr_items, name); + /* Only use part of data in the file */ + if (*nr && nr_items > *nr) + nr_items = *nr; + + file = fopen(name, "rb"); + if (!file) { + fprintf(stderr, "open %s err %s\n", name, strerror(errno)); + return -1; + } + + items = (struct var_size_str **)calloc(nr_items, sizeof(*items) + FILE_DATA_MAX_SIZE); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + i = 0; + item_max_len = 0; + cur = (void *)items + sizeof(*items) * nr_items; + while (true) { + unsigned int len; + + got = fgets(line, sizeof(line), file); + if (!got) { + if (!feof(file)) { + fprintf(stderr, "read file %s error\n", name); + err = -1; + } + break; + } + + len = strlen(got); + if (len && got[len - 1] == '\n') { + got[len - 1] = 0; + len -= 1; + } + if (!len) + continue; + + if (i >= nr_items) + break; + + if (len > item_max_len) + item_max_len = len; + cur->len = len; + memcpy(cur->data, got, len); + items[i++] = cur; + cur = (void *)cur + FILE_DATA_MAX_SIZE; + } + + if (!err) { + if (i != nr_items) + fprintf(stdout, "few lines in %s (exp %u got %u)\n", name, nr_items, i); + *nr = i; + *set = items; + *max_len = item_max_len; + } else { + free(items); + } + +out: + fclose(file); + return err; +} + +static int gen_data_set(unsigned int max_size, unsigned int nr, struct var_size_str ***set, + unsigned int *max_len) +{ +/* Due to the limitation of bpf memory allocator */ +#define GEN_DATA_MAX_SIZE 4088 + struct var_size_str **items; + size_t ptr_size, data_size; + struct var_size_str *cur; + unsigned int i; + size_t left; + ssize_t got; + int err = 0; + void *dst; + + ptr_size = nr * sizeof(*items); + data_size = nr * (sizeof(*cur) + max_size); + items = (struct var_size_str **)malloc(ptr_size + data_size); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + cur = (void *)items + ptr_size; + dst = cur; + left = data_size; + while (left > 0) { + got = syscall(__NR_getrandom, dst, left, 0); + if (got <= 0) { + fprintf(stderr, "getrandom error %s got %zd\n", strerror(errno), got); + err = -1; + goto out; + } + left -= got; + dst += got; + } + + for (i = 0; i < nr; i++) { + cur->len &= (max_size - 1); + cur->len += 1; + if (cur->len > GEN_DATA_MAX_SIZE) + cur->len = GEN_DATA_MAX_SIZE; + items[i] = cur; + memset(cur->data + cur->len, 0, max_size - cur->len); + cur = (void *)cur + (sizeof(*cur) + max_size); + } + fprintf(stdout, "generate %u random keys (max size %u)\n", nr, max_size); + + *set = items; + *max_len = max_size <= GEN_DATA_MAX_SIZE ? max_size : GEN_DATA_MAX_SIZE; +out: + if (err && items) + free(items); + return err; +} + +static inline bool is_pow_of_2(size_t x) +{ + return x && (x & (x - 1)) == 0; +} + +static void dynptr_key_validate(void) +{ + if (env.consumer_cnt != 0) { + fprintf(stderr, "dynptr_key benchmark doesn't support consumer!\n"); + exit(1); + } + + if (!args.file && !args.entries) { + fprintf(stderr, "must specify entries when use random generated data set\n"); + exit(1); + } + + if (args.file && access(args.file, R_OK)) { + fprintf(stderr, "data file is un-accessible\n"); + exit(1); + } + + if (args.entries && !is_pow_of_2(args.max_size)) { + fprintf(stderr, "invalid max size %u (should be power-of-two)\n", args.max_size); + exit(1); + } +} + +static void dynptr_key_init_map_opts(struct dynptr_key_bench *skel, unsigned int data_size, + unsigned int nr) +{ + /* The value will be used as the key for hash map */ + bpf_map__set_value_size(skel->maps.array, + offsetof(struct dynkey_key, desc) + data_size); + bpf_map__set_max_entries(skel->maps.array, nr); + + bpf_map__set_key_size(skel->maps.htab, offsetof(struct dynkey_key, desc) + data_size); + bpf_map__set_max_entries(skel->maps.htab, nr); + + bpf_map__set_max_entries(skel->maps.dynkey_htab, nr); +} + +static void dynptr_key_setup_key_map(struct bpf_map *map, struct var_size_str **set, + unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + void *value; + int err; + + value = (void *)set[i]; + err = bpf_map_update_elem(fd, &i, value, 0); + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static u64 dynptr_key_get_slab_mem(int dfd) +{ + const char *magic = "slab "; + const char *name = "memory.stat"; + int fd; + ssize_t nr; + char buf[4096]; + char *from; + + fd = openat(dfd, name, 0); + if (fd < 0) { + fprintf(stdout, "no %s (cgroup v1 ?)\n", name); + return 0; + } + + nr = read(fd, buf, sizeof(buf)); + if (nr <= 0) { + fprintf(stderr, "empty %s ?\n", name); + exit(1); + } + buf[nr - 1] = 0; + + close(fd); + + from = strstr(buf, magic); + if (!from) { + fprintf(stderr, "no slab in %s\n", name); + exit(1); + } + + return strtoull(from + strlen(magic), NULL, 10); +} + +static void dynptr_key_setup_lookup_map(struct bpf_map *map, unsigned int map_type, + struct var_size_str **set, unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + struct dynkey_key dynkey; + void *key; + int err; + + if (map_type == NORM_HTAB) { + key = set[i]; + } else { + dynkey.cookie = set[i]->len; + bpf_dynptr_user_init(set[i]->data, set[i]->len, &dynkey.desc); + key = &dynkey; + } + /* May have duplicated keys */ + err = bpf_map_update_elem(fd, key, &i, 0); + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static void dump_data_set_metric(struct var_size_str **set, unsigned int nr) +{ + double mean = 0.0, stddev = 0.0; + unsigned int max = 0; + unsigned int i; + + for (i = 0; i < nr; i++) { + if (set[i]->len > max) + max = set[i]->len; + mean += set[i]->len / (0.0 + nr); + } + + if (nr > 1) { + for (i = 0; i < nr; i++) + stddev += (mean - set[i]->len) * (mean - set[i]->len) / (nr - 1.0); + stddev = sqrt(stddev); + } + + fprintf(stdout, "str length: max %u mean %.0f stdev %.0f\n", max, mean, stddev); +} + +static inline unsigned int roundup_pow_of_2(unsigned int n) +{ + return 1U << (n > 1 ? 32 - __builtin_clz(n - 1) : 0); +} + +static void dynptr_key_setup(unsigned int map_type, const char *prog_name, bool neg_test) +{ + struct var_size_str **set = NULL, **neg_set = NULL; + unsigned int nr, max_len, neg_max_len; + struct dynptr_key_bench *skel; + struct bpf_program *prog; + struct bpf_link *link; + struct bpf_map *map; + u64 before, after; + int dfd; + int err; + + nr = args.entries; + if (!args.file) + err = gen_data_set(args.max_size, nr, &set, &max_len); + else + err = parse_data_set(args.file, &set, &nr, &max_len); + if (err < 0) + exit(1); + + dump_data_set_metric(set, nr); + + if (neg_test) { + err = gen_data_set(roundup_pow_of_2(max_len), nr, &neg_set, &neg_max_len); + if (err) + goto free_str_set; + } + + dfd = cgroup_setup_and_join("/dynptr_key"); + if (dfd < 0) { + fprintf(stderr, "failed to setup cgroup env\n"); + goto free_str_set; + } + + setup_libbpf(); + + before = dynptr_key_get_slab_mem(dfd); + + skel = dynptr_key_bench__open(); + if (!skel) { + fprintf(stderr, "failed to open skeleton\n"); + goto leave_cgroup; + } + + dynptr_key_init_map_opts(skel, max_len, nr); + + skel->rodata->max_dynkey_size = max_len; + skel->bss->update_nr = nr; + skel->bss->update_chunk = nr / env.producer_cnt; + + prog = bpf_object__find_program_by_name(skel->obj, prog_name); + if (!prog) { + fprintf(stderr, "no such prog %s\n", prog_name); + goto destroy_skel; + } + bpf_program__set_autoload(prog, true); + + err = dynptr_key_bench__load(skel); + if (err) { + fprintf(stderr, "failed to load skeleton\n"); + goto destroy_skel; + } + + dynptr_key_setup_key_map(skel->maps.array, neg_test ? neg_set : set, nr); + + map = (map_type == NORM_HTAB) ? skel->maps.htab : skel->maps.dynkey_htab; + dynptr_key_setup_lookup_map(map, map_type, set, nr); + + after = dynptr_key_get_slab_mem(dfd); + + link = bpf_program__attach(prog); + if (!link) { + fprintf(stderr, "failed to attach %s\n", prog_name); + goto destroy_skel; + } + + ctx.skel = skel; + ctx.cgrp_dfd = dfd; + ctx.map_slab_mem = after - before; + free(neg_set); + free(set); + return; + +destroy_skel: + dynptr_key_bench__destroy(skel); +leave_cgroup: + close(dfd); + cleanup_cgroup_environment(); +free_str_set: + free(neg_set); + free(set); + exit(1); +} + +static void dynkey_htab_lookup_setup(void) +{ + dynptr_key_setup(DYNPTR_KEY_HTAB, "dynkey_htab_lookup", false); +} + +static void norm_htab_lookup_setup(void) +{ + dynptr_key_setup(NORM_HTAB, "htab_lookup", false); +} + +static void dynkey_htab_neg_lookup_setup(void) +{ + dynptr_key_setup(DYNPTR_KEY_HTAB, "dynkey_htab_lookup", true); +} + +static void norm_htab_neg_lookup_setup(void) +{ + dynptr_key_setup(NORM_HTAB, "htab_lookup", true); +} + +static void dynkey_htab_update_setup(void) +{ + dynptr_key_setup(DYNPTR_KEY_HTAB, "dynkey_htab_update", false); +} + +static void norm_htab_update_setup(void) +{ + dynptr_key_setup(NORM_HTAB, "htab_update", false); +} + +static void *dynptr_key_producer(void *ctx) +{ + while (true) + (void)syscall(__NR_getpgid); + return NULL; +} + +static void dynptr_key_measure(struct bench_res *res) +{ + static __u64 last_hits, last_drops; + __u64 total_hits = 0, total_drops = 0; + unsigned int i, nr_cpus; + + nr_cpus = bpf_num_possible_cpus(); + for (i = 0; i < nr_cpus; i++) { + struct run_stat *s = (void *)&ctx.skel->bss->percpu_stats[i & 255]; + + total_hits += s->stats[0]; + total_drops += s->stats[1]; + } + + res->hits = total_hits - last_hits; + res->drops = total_drops - last_drops; + + last_hits = total_hits; + last_drops = total_drops; +} + +static void dynptr_key_report_final(struct bench_res res[], int res_cnt) +{ + close(ctx.cgrp_dfd); + cleanup_cgroup_environment(); + + fprintf(stdout, "Slab: %.3f MiB\n", (float)ctx.map_slab_mem / 1024 / 1024); + hits_drops_report_final(res, res_cnt); +} + +const struct bench bench_dynkey_htab_lookup = { + .name = "dynkey-htab-lookup", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = dynkey_htab_lookup_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; + +const struct bench bench_norm_htab_lookup = { + .name = "norm-htab-lookup", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = norm_htab_lookup_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; + +const struct bench bench_dynkey_htab_neg_lookup = { + .name = "dynkey-htab-neg-lookup", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = dynkey_htab_neg_lookup_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; + +const struct bench bench_norm_htab_neg_lookup = { + .name = "norm-htab-neg-lookup", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = norm_htab_neg_lookup_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; + +const struct bench bench_dynkey_htab_update = { + .name = "dynkey-htab-update", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = dynkey_htab_update_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; + +const struct bench bench_norm_htab_update = { + .name = "norm-htab-update", + .argp = &bench_dynptr_key_argp, + .validate = dynptr_key_validate, + .setup = norm_htab_update_setup, + .producer_thread = dynptr_key_producer, + .measure = dynptr_key_measure, + .report_progress = hits_drops_report_progress, + .report_final = dynptr_key_report_final, +}; diff --git a/tools/testing/selftests/bpf/benchs/run_bench_dynptr_key.sh b/tools/testing/selftests/bpf/benchs/run_bench_dynptr_key.sh new file mode 100755 index 0000000000000..a15e1a9f7ab02 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/run_bench_dynptr_key.sh @@ -0,0 +1,51 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source ./benchs/run_common.sh + +set -eufo pipefail + +prod_list=${PROD_LIST:-"1 2 4 8"} +entries=${ENTRIES:-8192} +max_size=${MAX_SIZE:-256} +str_file=${STR_FILE:-} + +summarize_rate_and_mem() +{ + local bench="$1" + local mem=$(echo $2 | grep Slab: | \ + sed -E "s/.*Slab:\s+([0-9]+\.[0-9]+ MiB).*/\1/") + local summary=$(echo $2 | tail -n1) + + printf "%-25s %s (drops %s, mem %s)\n" "$bench" "$(hits $summary)" \ + "$(drops $summary)" "$mem" +} + +htab_bench() +{ + local opts="--entries ${entries} --max_size ${max_size}" + local desc="${entries}" + local name + local prod + + if test -n "${str_file}" && test -f "${str_file}" + then + opts="--file ${str_file}" + desc="${str_file}" + fi + + for name in htab-lookup htab-neg-lookup htab-update + do + for prod in ${prod_list} + do + summarize_rate_and_mem "${name}-p${prod}-${desc}" \ + "$($RUN_BENCH -p${prod} ${1}-${name} ${opts})" + done + done +} + +header "normal hash map" +htab_bench norm + +header "dynptr-keyed hash map" +htab_bench dynkey diff --git a/tools/testing/selftests/bpf/progs/dynptr_key_bench.c b/tools/testing/selftests/bpf/progs/dynptr_key_bench.c new file mode 100644 index 0000000000000..8bd81f53e610a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/dynptr_key_bench.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +struct bpf_map; + +struct dynkey_key { + /* Use 8 bytes to prevent unnecessary hole */ + __u64 cookie; + struct bpf_dynptr desc; +}; + +struct var_size_key { + __u64 len; + unsigned char data[]; +}; + +/* Its value will be used as the key of hash map. The size of value is fixed, + * however, the first 8 bytes denote the length of valid data in the value. + */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(key_size, 4); +} array SEC(".maps"); + +/* key_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(value_size, 4); + __uint(map_flags, BPF_F_NO_PREALLOC); +} htab SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, struct dynkey_key); + __type(value, unsigned int); + __uint(map_flags, BPF_F_NO_PREALLOC); +} dynkey_htab SEC(".maps"); + +char _license[] SEC("license") = "GPL"; + +struct { + __u64 stats[2]; +} __attribute__((__aligned__(256))) percpu_stats[256]; + +struct update_ctx { + unsigned int max; + unsigned int from; +}; + +volatile const unsigned int max_dynkey_size; +unsigned int update_nr; +unsigned int update_chunk; + +static __always_inline void update_stats(int idx) +{ + __u32 cpu = bpf_get_smp_processor_id(); + + percpu_stats[cpu & 255].stats[idx]++; +} + +static int lookup_htab(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + __u32 *index; + + index = bpf_map_lookup_elem(&htab, value); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int lookup_dynkey_htab(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + struct var_size_key *var_size_key = value; + struct dynkey_key dynkey; + __u32 *index; + __u64 len; + + len = var_size_key->len; + if (len > max_dynkey_size) + return 0; + + dynkey.cookie = len; + bpf_dynptr_from_mem(var_size_key->data, len, 0, &dynkey.desc); + index = bpf_map_lookup_elem(&dynkey_htab, &dynkey); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int update_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&array, &update->from); + if (!value) + return 1; + + err = bpf_map_update_elem(&htab, value, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&array, &update->from); + if (!value) + return 1; + + err = bpf_map_delete_elem(&htab, value); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +static int update_dynkey_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + struct var_size_key *value; + struct dynkey_key dynkey; + __u64 len; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&array, &update->from); + if (!value) + return 1; + len = value->len; + if (len > max_dynkey_size) + return 1; + + dynkey.cookie = len; + bpf_dynptr_from_mem(value->data, len, 0, &dynkey.desc); + err = bpf_map_update_elem(&dynkey_htab, &dynkey, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_dynkey_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + struct var_size_key *value; + struct dynkey_key dynkey; + __u64 len; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&array, &update->from); + if (!value) + return 1; + len = value->len; + if (len > max_dynkey_size) + return 1; + + dynkey.cookie = len; + bpf_dynptr_from_mem(value->data, len, 0, &dynkey.desc); + err = bpf_map_delete_elem(&dynkey_htab, &dynkey); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +SEC("?tp/syscalls/sys_enter_getpgid") +int htab_lookup(void *ctx) +{ + bpf_for_each_map_elem(&array, lookup_htab, NULL, 0); + return 0; +} + +SEC("?tp/syscalls/sys_enter_getpgid") +int dynkey_htab_lookup(void *ctx) +{ + bpf_for_each_map_elem(&array, lookup_dynkey_htab, NULL, 0); + return 0; +} + +SEC("?tp/syscalls/sys_enter_getpgid") +int htab_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_htab_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_htab_loop, &update, 0); + + return 0; +} + +SEC("?tp/syscalls/sys_enter_getpgid") +int dynkey_htab_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_dynkey_htab_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_dynkey_htab_loop, &update, 0); + + return 0; +}