From patchwork Sat Sep 17 15:31:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979175 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CE37ECAAD3 for ; Sat, 17 Sep 2022 15:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229527AbiIQPN2 (ORCPT ); Sat, 17 Sep 2022 11:13:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbiIQPN1 (ORCPT ); Sat, 17 Sep 2022 11:13:27 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AB4F2F667 for ; Sat, 17 Sep 2022 08:13:24 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDt659ZwzKMwY for ; Sat, 17 Sep 2022 23:11:26 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S5; Sat, 17 Sep 2022 23:13:22 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size Date: Sat, 17 Sep 2022 23:31:16 +0800 Message-Id: <20220917153125.2001645-2-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S5 X-Coremail-Antispam: 1UD129KBjvJXoW7uFWDXrWUuw4xtF15tw4UXFb_yoW8CryUpF ykC347Xr48tFW7Jr4UJFs7Zw4Ygw4DWr17GryjkryrKrZFq3sxur1jgryxWr98t34DGrW3 Arn7KrWFvw18Xr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBjb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV WxJVW8Jr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx 0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWU JVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCF04k20xvY0x0EwI xGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480 Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7 IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k2 6cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxV AFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07jn9N3UUUUU= X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao For map with bpf_dynptr-typed key, lookup and update procedures will use bpf_dynptr_get_size() to get the length of key, and iteration procedure will use bpf_dynptr_set_size() to set the length of returned key. Signed-off-by: Hou Tao --- include/linux/bpf.h | 2 ++ kernel/bpf/helpers.c | 9 ++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e0dbe0c0a17e..8da4a8190413 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2647,6 +2647,8 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type, u32 offset, u32 size); void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); int bpf_dynptr_check_size(u32 size); +u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr); +void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size); #ifdef CONFIG_BPF_LSM void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 41aeaf3862ec..7dbda4acb2c2 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1408,11 +1408,18 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ ptr->size |= type << DYNPTR_TYPE_SHIFT; } -static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr) +u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr) { return ptr->size & DYNPTR_SIZE_MASK; } +void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size) +{ + u32 metadata = ptr->size & ~DYNPTR_SIZE_MASK; + + ptr->size = new_size | metadata; +} + int bpf_dynptr_check_size(u32 size) { return size > DYNPTR_MAX_SIZE ? -E2BIG : 0; From patchwork Sat Sep 17 15:31:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979180 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5987ECAAD3 for ; Sat, 17 Sep 2022 15:13:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229510AbiIQPNa (ORCPT ); Sat, 17 Sep 2022 11:13:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229518AbiIQPN2 (ORCPT ); Sat, 17 Sep 2022 11:13:28 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC04E2B1B8 for ; Sat, 17 Sep 2022 08:13:26 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDt72g84zKNMG for ; Sat, 17 Sep 2022 23:11:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S6; Sat, 17 Sep 2022 23:13:23 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 02/10] bpf: Add helper btf_type_is_bpf_dynptr() Date: Sat, 17 Sep 2022 23:31:17 +0800 Message-Id: <20220917153125.2001645-3-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S6 X-Coremail-Antispam: 1UD129KBjvJXoW7CrykJr1fCFyfCF1ktF1fJFb_yoW8AF4Upa 1DJayxZr48JrWagw1DWa15Cw43Kw4UGw4jyFy2g343tFZIqr9rXF4DKr1xuryYkrWjkr17 ZrnIg398A34xArDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1sa9DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add helper btf_type_is_bpf_dynptr() to check whether or not the passed btf type is bpf_dynptr. It will be used by bpf map (e.g. qp-trie) which uses bpf_dynptr as map key. Signed-off-by: Hou Tao --- include/linux/btf.h | 1 + kernel/bpf/btf.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/btf.h b/include/linux/btf.h index 1fcc833a8690..d52f9979ea49 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -156,6 +156,7 @@ int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); int btf_find_timer(const struct btf *btf, const struct btf_type *t); struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf, const struct btf_type *t); +bool btf_type_is_bpf_dynptr(const struct btf *btf, const struct btf_type *t); bool btf_type_is_void(const struct btf_type *t); s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index b3940c605aac..f94b6112a649 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3522,6 +3522,13 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf, return ERR_PTR(ret); } +bool btf_type_is_bpf_dynptr(const struct btf *btf, const struct btf_type *t) +{ + return __btf_type_is_struct(t) && + t->size == sizeof(struct bpf_dynptr) && + !strcmp("bpf_dynptr", __btf_name_by_offset(btf, t->name_off)); +} + static void __btf_struct_show(const struct btf *btf, const struct btf_type *t, u32 type_id, void *data, u8 bits_offset, struct btf_show *show) From patchwork Sat Sep 17 15:31:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979177 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1346C6FA82 for ; Sat, 17 Sep 2022 15:13:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229557AbiIQPNb (ORCPT ); Sat, 17 Sep 2022 11:13:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbiIQPN2 (ORCPT ); Sat, 17 Sep 2022 11:13:28 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D52DF30F6B for ; Sat, 17 Sep 2022 08:13:25 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDt80LxQzKNpq for ; Sat, 17 Sep 2022 23:11:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S7; Sat, 17 Sep 2022 23:13:23 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Date: Sat, 17 Sep 2022 23:31:18 +0800 Message-Id: <20220917153125.2001645-4-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S7 X-Coremail-Antispam: 1UD129KBjvJXoWxtFWDJry3AryfWFy8Cr45KFg_yoWfuFyrpF 4kCrWfZrWFqFy7JrWDJa10vw40qws2gw17GrZ3K34rCFsrXr9xZr18tFWvgr90vFyDGr4S qrWkKayrZ348ArJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1c4S7UUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Userspace application uses bpf syscall to lookup or update bpf map. It passes a pointer of fixed-size buffer to kernel to represent the map key. To support map with variable-length key, introduce bpf_dynptr_user to allow userspace to pass a pointer of bpf_dynptr_user to specify the address and the length of key buffer. And in order to represent dynptr from userspace, adding a new dynptr type: BPF_DYNPTR_TYPE_USER. Because BPF_DYNPTR_TYPE_USER-typed dynptr is not available for bpf program, so verifier doesn't need update. To distinguish map with fixed-size key from map with variable-length one, add a new map flag: BPF_F_DYNPTR_KEY to do that. For map which enables BPF_F_DYNPTR_KEY, key btf type must be bpf_dynptr and the lower 32-bits of map_extra is used to specify the maximum size of dynptr key. Signed-off-by: Hou Tao Reported-by: kernel test robot Reported-by: kernel test robot --- include/linux/bpf.h | 2 + include/uapi/linux/bpf.h | 9 +++ kernel/bpf/syscall.c | 108 ++++++++++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 8 +++ 4 files changed, 113 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8da4a8190413..5060d7aee08c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2641,6 +2641,8 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_LOCAL, /* Underlying data is a ringbuf record */ BPF_DYNPTR_TYPE_RINGBUF, + /* Points to memory copied from/to userspace */ + BPF_DYNPTR_TYPE_USER, }; void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 3df78c56c1bf..77a2828f8148 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1246,6 +1246,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Map with bpf_dynptr-typed key */ + BPF_F_DYNPTR_KEY = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -6775,6 +6778,12 @@ struct bpf_timer { __u64 :64; } __attribute__((aligned(8))); +struct bpf_dynptr_user { + __u64 data; + __u32 size; + __u32 :32; +} __attribute__((aligned(8))); + struct bpf_dynptr { __u64 :64; __u64 :64; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index dab156f09f8d..fd15c13cef24 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -151,6 +151,11 @@ bool bpf_map_write_active(const struct bpf_map *map) return atomic64_read(&map->writecnt) != 0; } +static inline bool is_dynptr_key(const struct bpf_map *map) +{ + return map->map_flags & BPF_F_DYNPTR_KEY; +} + static u32 bpf_map_value_size(const struct bpf_map *map) { if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || @@ -994,7 +999,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, /* Some maps allow key to be unspecified. */ if (btf_key_id) { key_type = btf_type_id_size(btf, &btf_key_id, &key_size); - if (!key_type || key_size != map->key_size) + if (!key_type || key_size != map->key_size || + (is_dynptr_key(map) && !btf_type_is_bpf_dynptr(btf, key_type))) return -EINVAL; } else { key_type = btf_type_by_id(btf, 0); @@ -1089,9 +1095,16 @@ static int map_create(union bpf_attr *attr) return -EINVAL; } - if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER && - attr->map_extra != 0) + if (attr->map_flags & BPF_F_DYNPTR_KEY) { + /* The lower 32-bits of map_extra specifies the maximum size + * of bpf_dynptr-typed key + */ + if (!attr->btf_key_type_id || !attr->map_extra || (attr->map_extra >> 32) || + bpf_dynptr_check_size(attr->map_extra)) + return -EINVAL; + } else if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER && attr->map_extra != 0) { return -EINVAL; + } f_flags = bpf_get_file_flag(attr->map_flags); if (f_flags < 0) @@ -1280,8 +1293,39 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) return -ENOTSUPP; } -static void *__bpf_copy_key(void __user *ukey, u64 key_size) +static void *bpf_copy_from_dynptr_ukey(bpfptr_t ukey) { + struct bpf_dynptr_kern *kptr; + struct bpf_dynptr_user uptr; + bpfptr_t data; + + if (copy_from_bpfptr(&uptr, ukey, sizeof(uptr))) + return ERR_PTR(-EFAULT); + + if (!uptr.size || bpf_dynptr_check_size(uptr.size)) + return ERR_PTR(-EINVAL); + + /* Allocate and free bpf_dynptr_kern and its data together */ + kptr = kvmalloc(sizeof(*kptr) + uptr.size, GFP_USER | __GFP_NOWARN); + if (!kptr) + return ERR_PTR(-ENOMEM); + + data = make_bpfptr(uptr.data, bpfptr_is_kernel(ukey)); + if (copy_from_bpfptr(&kptr[1], data, uptr.size)) { + kvfree(kptr); + return ERR_PTR(-EFAULT); + } + + bpf_dynptr_init(kptr, &kptr[1], BPF_DYNPTR_TYPE_USER, 0, uptr.size); + + return kptr; +} + +static void *__bpf_copy_key(void __user *ukey, u64 key_size, bool dynptr) +{ + if (dynptr) + return bpf_copy_from_dynptr_ukey(USER_BPFPTR(ukey)); + if (key_size) return vmemdup_user(ukey, key_size); @@ -1291,8 +1335,11 @@ static void *__bpf_copy_key(void __user *ukey, u64 key_size) return NULL; } -static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size) +static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size, bool dynptr) { + if (dynptr) + return bpf_copy_from_dynptr_ukey(ukey); + if (key_size) return kvmemdup_bpfptr(ukey, key_size); @@ -1302,6 +1349,34 @@ static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size) return NULL; } +static void *bpf_new_dynptr_key(u32 key_size) +{ + struct bpf_dynptr_kern *kptr; + + kptr = kvmalloc(sizeof(*kptr) + key_size, GFP_USER | __GFP_NOWARN); + if (kptr) + bpf_dynptr_init(kptr, &kptr[1], BPF_DYNPTR_TYPE_USER, 0, key_size); + return kptr; +} + +static int bpf_copy_to_dynptr_ukey(struct bpf_dynptr_user __user *uptr, + struct bpf_dynptr_kern *kptr) +{ + unsigned int size; + u64 udata; + + if (get_user(udata, &uptr->data)) + return -EFAULT; + + /* Also zeroing the reserved field in uptr */ + size = bpf_dynptr_get_size(kptr); + if (copy_to_user(u64_to_user_ptr(udata), kptr->data + kptr->offset, size) || + put_user(size, &uptr->size) || put_user(0, &uptr->size + 1)) + return -EFAULT; + + return 0; +} + /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags @@ -1337,7 +1412,7 @@ static int map_lookup_elem(union bpf_attr *attr) goto err_put; } - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map)); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1377,7 +1452,6 @@ static int map_lookup_elem(union bpf_attr *attr) return err; } - #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) @@ -1410,7 +1484,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) goto err_put; } - key = ___bpf_copy_key(ukey, map->key_size); + key = ___bpf_copy_key(ukey, map->key_size, is_dynptr_key(map)); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1458,7 +1532,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) goto err_put; } - key = ___bpf_copy_key(ukey, map->key_size); + key = ___bpf_copy_key(ukey, map->key_size, is_dynptr_key(map)); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1514,7 +1588,7 @@ static int map_get_next_key(union bpf_attr *attr) } if (ukey) { - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map)); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -1524,7 +1598,10 @@ static int map_get_next_key(union bpf_attr *attr) } err = -ENOMEM; - next_key = kvmalloc(map->key_size, GFP_USER); + if (is_dynptr_key(map)) + next_key = bpf_new_dynptr_key(map->map_extra); + else + next_key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); if (!next_key) goto free_key; @@ -1540,8 +1617,11 @@ static int map_get_next_key(union bpf_attr *attr) if (err) goto free_next_key; - err = -EFAULT; - if (copy_to_user(unext_key, next_key, map->key_size) != 0) + if (is_dynptr_key(map)) + err = bpf_copy_to_dynptr_ukey(unext_key, next_key); + else + err = copy_to_user(unext_key, next_key, map->key_size) != 0 ? -EFAULT : 0; + if (err) goto free_next_key; err = 0; @@ -1815,7 +1895,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) goto err_put; } - key = __bpf_copy_key(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map)); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 3df78c56c1bf..600c3fcee37a 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1246,6 +1246,8 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), +/* Map with bpf_dynptr-typed key */ + BPF_F_DYNPTR_KEY = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -6775,6 +6777,12 @@ struct bpf_timer { __u64 :64; } __attribute__((aligned(8))); +struct bpf_dynptr_user { + __u64 data; + __u32 size; + __u32 :32; +} __attribute__((aligned(8))); + struct bpf_dynptr { __u64 :64; __u64 :64; From patchwork Sat Sep 17 15:31:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979176 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FA4AC54EE9 for ; Sat, 17 Sep 2022 15:13:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229515AbiIQPN3 (ORCPT ); Sat, 17 Sep 2022 11:13:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbiIQPN2 (ORCPT ); Sat, 17 Sep 2022 11:13:28 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D503201A7 for ; Sat, 17 Sep 2022 08:13:26 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDt853KfzKNFq for ; Sat, 17 Sep 2022 23:11:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S8; Sat, 17 Sep 2022 23:13:24 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 04/10] bpf: Support bpf_dynptr-typed map key for bpf program Date: Sat, 17 Sep 2022 23:31:19 +0800 Message-Id: <20220917153125.2001645-5-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S8 X-Coremail-Antispam: 1UD129KBjvJXoW7KF17ZryrtF1rGr4fZFWUArb_yoW8Gr17pF 1kGrW2gr48KFsI9w13ZFZ7Zry5G34Sq3y3Cw4Sy3ySvF17ArW5u3y8GF17WFy5trZ7K34F yw4qvFWYv34UCFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBIb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU13l1DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao When bpf program manipulates a BPF_F_DYNPTR_KEY-enabled map, only allow it to use a bpf_dynptr as map key. Signed-off-by: Hou Tao --- kernel/bpf/verifier.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8c6fbcd0afaf..169c0b3e8002 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6000,9 +6000,21 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "invalid map_ptr to access map->key\n"); return -EACCES; } - err = check_helper_mem_access(env, regno, - meta->map_ptr->key_size, false, - NULL); + /* For BPF_F_DYNPTR_KEY-enabled map, only allow bpf_dynptr + * to be used as map key + */ + if (meta->map_ptr->map_flags & BPF_F_DYNPTR_KEY) { + if (base_type(reg->type) != PTR_TO_STACK || + !is_dynptr_reg_valid_init(env, reg, ARG_PTR_TO_DYNPTR)) { + verbose(env, "expect R%d to be dynptr instead of %s\n", + regno, reg_type_str(env, reg->type)); + return -EACCES; + } + } else { + err = check_helper_mem_access(env, regno, + meta->map_ptr->key_size, false, + NULL); + } break; case ARG_PTR_TO_MAP_VALUE: if (type_may_be_null(arg_type) && register_is_null(reg)) From patchwork Sat Sep 17 15:31:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979178 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5E5CC6FA90 for ; Sat, 17 Sep 2022 15:13:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229612AbiIQPNd (ORCPT ); Sat, 17 Sep 2022 11:13:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229533AbiIQPN3 (ORCPT ); Sat, 17 Sep 2022 11:13:29 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 692492C129 for ; Sat, 17 Sep 2022 08:13:28 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDt92dpkzKNy6 for ; Sat, 17 Sep 2022 23:11:29 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S9; Sat, 17 Sep 2022 23:13:25 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 05/10] libbpf: Add helpers for bpf_dynptr_user Date: Sat, 17 Sep 2022 23:31:20 +0800 Message-Id: <20220917153125.2001645-6-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S9 X-Coremail-Antispam: 1UD129KBjvJXoW7Zr4DKFy5tF1kAr18Gw1xXwb_yoW8XF1Dpa yfK3y3Zr4rJFW3Cwn8JF4SyrW5uF4xXr1UKrWxK34rAr43XFZ8ZF1jkr1ayr1YyrWkWrWI vrZxKrW5WF18JF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add bpf_dynptr_user_init() to initialize a bpf_dynptr, and bpf_dynptr_user_get_{data,size} to get the buffer address and buffer length of bpf_dynptr_user. Instead of exporting these symbols, simply adding these helpers as inline functions. Signed-off-by: Hou Tao --- tools/lib/bpf/bpf.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 9c50beabdd14..cd3e9dbaffa1 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -371,6 +371,25 @@ LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id); LIBBPF_API int bpf_link_get_fd_by_id(__u32 id); LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len); +/* sys_bpf() will check the validity of size */ +static inline void bpf_dynptr_user_init(void *data, __u32 size, struct bpf_dynptr_user *dynptr) +{ + /* Zero padding bytes */ + memset(dynptr, 0, sizeof(*dynptr)); + dynptr->data = (__u64)(unsigned long)data; + dynptr->size = size; +} + +static inline __u32 bpf_dynptr_user_get_size(const struct bpf_dynptr_user *dynptr) +{ + return dynptr->size; +} + +static inline void *bpf_dynptr_user_get_data(const struct bpf_dynptr_user *dynptr) +{ + return (void *)(unsigned long)dynptr->data; +} + struct bpf_prog_query_opts { size_t sz; /* size of this struct for forward/backward compatibility */ __u32 query_flags; From patchwork Sat Sep 17 15:31:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979182 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 796DFC6FA92 for ; Sat, 17 Sep 2022 15:13:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229455AbiIQPNf (ORCPT ); Sat, 17 Sep 2022 11:13:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbiIQPNa (ORCPT ); Sat, 17 Sep 2022 11:13:30 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34A6E2B257 for ; Sat, 17 Sep 2022 08:13:28 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDtW3yRDzlClW for ; Sat, 17 Sep 2022 23:11:47 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S10; Sat, 17 Sep 2022 23:13:25 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 06/10] bpf: Add support for qp-trie map Date: Sat, 17 Sep 2022 23:31:21 +0800 Message-Id: <20220917153125.2001645-7-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S10 X-Coremail-Antispam: 1UD129KBjvAXoWfZr4UKFykXw1UWw1fJFWkZwb_yoW5uF1fuo WxArs8tr1kGr1UZF4rGasaqFyfu3yDWFykArsa9rsI9F13tF1Yv3yUCa1fJF42qr4rGF43 X34IqwnxJr48Jr1kn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYu7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF 0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87 Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU13l1DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The initial motivation for qp-trie map is to reduce memory usage for string keys specially those with large differencies in length. Moreover as a big-endian lexicographic-ordered map, qp-trie can also be used for any binary data with fixed or variable length. The memory efficiency of qp-tries comes partly from the design of qp-trie which doesn't save key for branch node and uses sparse array to save leaf nodes, partly comes from the support of bpf_dynptr-typed key: only the used part in key is saved. But the memory efficiency and ordered keys come with cost: for strings (e.g. symbol names in /proc/kallsyms) the lookup performance of qp-trie is about 30% or more slower compared with hash-table. But the lookup performance is not always bad than hash-table, for randomly generated binary data set with big differencies in length, the lookup performance of qp-trie will be twice as good as hash-table as showed in the following benchmark. Signed-off-by: Hou Tao --- include/linux/bpf_types.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/Makefile | 1 + kernel/bpf/bpf_qp_trie.c | 1055 ++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 1 + 5 files changed, 1059 insertions(+) create mode 100644 kernel/bpf/bpf_qp_trie.c diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 2b9112b80171..a73d47f83d74 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_QP_TRIE, qp_trie_map_ops) BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 77a2828f8148..e35e70b5cf0d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -928,6 +928,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_QP_TRIE, }; /* Note that tracing related programs such as diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 341c94f208f4..8419f44fea50 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_i obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o +obj-$(CONFIG_BPF_SYSCALL) += bpf_qp_trie.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o diff --git a/kernel/bpf/bpf_qp_trie.c b/kernel/bpf/bpf_qp_trie.c new file mode 100644 index 000000000000..4f56f69360ce --- /dev/null +++ b/kernel/bpf/bpf_qp_trie.c @@ -0,0 +1,1055 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Derived from qp.c in https://github.com/fanf2/qp.git + * + * Copyright (C) 2022. Huawei Technologies Co., Ltd + */ +#include +#include +#include +#include +#include +#include +#include + +/* qp-trie (quadbit popcount trie) is a memory efficient trie. Unlike + * normal trie which uses byte as lookup key, qp-trie interprets its keys + * as quadbit/nibble array and uses one nibble each time during lookup. + * The most significant nibble (upper nibble) of byte N in the key will + * be the 2*N element of nibble array, and the least significant nibble + * (lower nibble) of byte N will be the 2*N+1 element in nibble array. + * + * For normal trie, it may have 256 child nodes, and for qp-trie one branch + * node may have 17 child nodes. #0 child node is special because it must + * be a leaf node and its key is the same as the branch node. #1~#16 child + * nodes represent leaf nodes or branch nodes which have different keys + * with parent node. The key of branch node is the common prefix for these + * child nodes, and the index of child node minus one is the value of first + * different nibble between these child nodes. + * + * qp-trie reduces memory usage through two methods: + * (1) Branch node doesn't store the key. It only stores the position of + * the first nibble which differentiates child nodes. + * (2) Branch node doesn't store all 17 child nodes. It uses a bitmap and + * popcount() to implement a sparse array and only allocates memory + * for those present children. + * + * Like normal trie, qp-trie is also ordered and is in big-endian + * lexicographic order. If traverse qp-trie in a depth-first way, it will + * return a string of ordered keys. + * + * The following diagrams show the construction of a tiny qp-trie: + * + * (1) insert abc + * + * [ leaf node: abc ] + * + * (2) insert abc_d + * + * The first different nibble between "abc" and "abc_d" is the upper nibble + * of character '_' (0x5), and its position in nibble array is 6 + * (starts from 0). + * + * [ branch node ] bitmap: 0x41 diff pos: 6 + * | + * * + * children + * [0] [6] + * | | + * [leaf: abc] [leaf: abc_d] + * + * (3) insert abc_e + * + * The first different nibble between "abc_d" and "abc_e" is the lower + * nibble of character 'd'/'e', and its position in array is 9. + * + * [ branch node ] bitmap: 0x41 diff pos: 6 + * | + * * + * children + * [0] [6] + * | | + * [leaf: abc] | + * * + * [ branch node ] bitmap: 0x60 diff pos: 9 + * | + * * + * children + * [5] [6] + * | | + * [leaf: abc_d] [leaf: abc_e] + */ + +#define QP_TRIE_MANDATORY_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY) +#define QP_TRIE_CREATE_FLAG_MASK (QP_TRIE_MANDATORY_FLAG_MASK | BPF_F_NUMA_NODE | \ + BPF_F_ACCESS_MASK) + +/* bit[0] of nodes in qp_trie_branch is used to tell node type: + * + * bit[0]: 0-branch node + * bit[0]: 1-leaf node + * + * Size of qp_trie_branch is already 2-bytes aligned, so only need to make + * allocation of leaf node to be 2-bytes aligned. + */ +#define QP_TRIE_LEAF_NODE_MASK 1UL +#define QP_TRIE_LEAF_ALLOC_ALIGN 2 + +/* To reduce memory usage, only qp_trie_branch is RCU-freed. To handle + * freeing of the last leaf node, an extra qp_trie_branch node is + * allocated. The branch node has only one child and its index is 0. It + * is set as root node after adding the first leaf node. + */ +#define QP_TRIE_ROOT_NODE_INDEX 0 +#define QP_TRIE_NON_ROOT_NODE_MASK 1 + +#define QP_TRIE_NIBBLE_SHIFT 1 +#define QP_TRIE_BYTE_INDEX_SHIFT 2 + +#define QP_TRIE_TWIGS_FREE_NONE_IDX 17 + +struct qp_trie_branch { + /* The bottom two bits of index are used as special flags: + * + * bit[0]: 0-root, 1-not root + * bit[1]: 0-upper nibble, 1-lower nibble + * + * bit[2:31]: byte index for key + */ + unsigned int index; + /* 17 bits are used to accommodate arbitrary keys, even when there are + * zero-bytes in these keys. + * + * bit[0]: a leaf node has the same key as the prefix of parent node + * bit[N]: a child node with the value of nibble at index as (N - 1) + */ + unsigned int bitmap:17; + /* The index of leaf node will be RCU-freed together */ + unsigned int to_free_idx:5; + struct qp_trie_branch __rcu *parent; + struct rcu_head rcu; + void __rcu *nodes[0]; +}; + +#define QP_TRIE_NR_SUBTREE 256 + +struct qp_trie { + struct bpf_map map; + atomic_t entries; + void __rcu *roots[QP_TRIE_NR_SUBTREE]; + spinlock_t locks[QP_TRIE_NR_SUBTREE]; +}; + +/* Internally use qp_trie_key instead of bpf_dynptr_kern + * to reduce memory usage + */ +struct qp_trie_key { + /* the length of blob data */ + unsigned int len; + /* blob data */ + unsigned char data[0]; +}; + +struct qp_trie_diff { + unsigned int index; + unsigned int sibling_bm; + unsigned int new_bm; +}; + +static inline void *to_child_node(const struct qp_trie_key *key) +{ + return (void *)((long)key | QP_TRIE_LEAF_NODE_MASK); +} + +static inline struct qp_trie_key *to_leaf_node(void *node) +{ + return (void *)((long)node & ~QP_TRIE_LEAF_NODE_MASK); +} + +static inline bool is_branch_node(void *node) +{ + return !((long)node & QP_TRIE_LEAF_NODE_MASK); +} + +static inline bool is_same_key(const struct qp_trie_key *k, const unsigned char *data, + unsigned int len) +{ + return k->len == len && !memcmp(k->data, data, len); +} + +static inline void *qp_trie_leaf_value(const struct qp_trie_key *key) +{ + return (void *)key + sizeof(*key) + key->len; +} + +static inline unsigned int calc_twig_index(unsigned int mask, unsigned int bitmap) +{ + return hweight32(mask & (bitmap - 1)); +} + +static inline unsigned int calc_twig_nr(unsigned int bitmap) +{ + return hweight32(bitmap); +} + +static inline unsigned int nibble_to_bitmap(unsigned char nibble) +{ + return 1U << (nibble + 1); +} + +static inline unsigned int index_to_byte_index(unsigned int index) +{ + return index >> QP_TRIE_BYTE_INDEX_SHIFT; +} + +static inline unsigned int calc_br_bitmap(unsigned int index, const unsigned char *data, + unsigned int len) +{ + unsigned int byte; + unsigned char nibble; + + if (index == QP_TRIE_ROOT_NODE_INDEX) + return 1; + + byte = index_to_byte_index(index); + if (byte >= len) + return 1; + + nibble = data[byte]; + /* lower nibble */ + if ((index >> QP_TRIE_NIBBLE_SHIFT) & 1) + nibble &= 0xf; + else + nibble >>= 4; + return nibble_to_bitmap(nibble); +} + +static void qp_trie_free_twigs_rcu(struct rcu_head *rcu) +{ + struct qp_trie_branch *twigs = container_of(rcu, struct qp_trie_branch, rcu); + unsigned int idx = twigs->to_free_idx; + + if (idx != QP_TRIE_TWIGS_FREE_NONE_IDX) + kfree(to_leaf_node(rcu_access_pointer(twigs->nodes[idx]))); + kfree(twigs); +} + +static void qp_trie_branch_free(struct qp_trie_branch *twigs, unsigned int to_free_idx) +{ + twigs->to_free_idx = to_free_idx; + call_rcu(&twigs->rcu, qp_trie_free_twigs_rcu); +} + +static inline struct qp_trie_branch * +qp_trie_branch_new(struct bpf_map *map, unsigned int nr) +{ + struct qp_trie_branch *a; + + a = bpf_map_kmalloc_node(map, sizeof(*a) + nr * sizeof(*a->nodes), + GFP_NOWAIT | __GFP_NOWARN, map->numa_node); + return a; +} + +static inline void qp_trie_assign_parent(struct qp_trie_branch *parent, void *node) +{ + if (is_branch_node(node)) + rcu_assign_pointer(((struct qp_trie_branch *)node)->parent, parent); +} + +static void qp_trie_update_parent(struct qp_trie_branch *parent, unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++) + qp_trie_assign_parent(parent, rcu_dereference_protected(parent->nodes[i], 1)); +} + +/* new_node can be either a leaf node or a branch node */ +static struct qp_trie_branch * +qp_trie_branch_replace(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap, + void *new_node) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + + rcu_assign_pointer(twigs->nodes[p], new_node); + + if (nr - 1 > p) + memcpy(&twigs->nodes[p+1], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes)); + + twigs->index = old->index; + twigs->bitmap = old->bitmap; + /* twigs will not be visible to reader until rcu_assign_pointer(), so + * use RCU_INIT_POINTER() here. + */ + RCU_INIT_POINTER(twigs->parent, old->parent); + + /* Initialize ->parent of parent node first, then update ->parent for + * child nodes after parent node is fully initialized. + */ + qp_trie_update_parent(twigs, nr); + + return twigs; +} + +static struct qp_trie_branch * +qp_trie_branch_insert(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap, + const struct qp_trie_key *new) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr + 1); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + + rcu_assign_pointer(twigs->nodes[p], to_child_node(new)); + + if (nr > p) + memcpy(&twigs->nodes[p+1], &old->nodes[p], (nr - p) * sizeof(*twigs->nodes)); + + twigs->bitmap = old->bitmap | bitmap; + twigs->index = old->index; + RCU_INIT_POINTER(twigs->parent, old->parent); + + qp_trie_update_parent(twigs, nr + 1); + + return twigs; +} + +static struct qp_trie_branch * +qp_trie_branch_remove(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr - 1); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + if (nr - 1 > p) + memcpy(&twigs->nodes[p], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes)); + + twigs->bitmap = old->bitmap & ~bitmap; + twigs->index = old->index; + RCU_INIT_POINTER(twigs->parent, old->parent); + + qp_trie_update_parent(twigs, nr - 1); + + return twigs; +} + +static struct qp_trie_key * +qp_trie_init_leaf_node(struct bpf_map *map, const struct bpf_dynptr_kern *k, void *v) +{ + unsigned int key_size, total; + struct qp_trie_key *new; + + key_size = bpf_dynptr_get_size(k); + if (!key_size || key_size > (u32)map->map_extra) + return ERR_PTR(-EINVAL); + + total = round_up(sizeof(*new) + key_size + map->value_size, QP_TRIE_LEAF_ALLOC_ALIGN); + new = bpf_map_kmalloc_node(map, total, GFP_NOWAIT | __GFP_NOWARN, map->numa_node); + if (!new) + return ERR_PTR(-ENOMEM); + + new->len = key_size; + memcpy(new->data, k->data + k->offset, key_size); + memcpy((void *)&new[1] + key_size, v, map->value_size); + + return new; +} + +static bool calc_prefix_len(const struct qp_trie_key *s_key, const struct qp_trie_key *n_key, + unsigned int *index) +{ + unsigned int i, len = min(s_key->len, n_key->len); + unsigned char diff = 0; + + for (i = 0; i < len; i++) { + diff = s_key->data[i] ^ n_key->data[i]; + if (diff) + break; + } + + *index = (i << QP_TRIE_BYTE_INDEX_SHIFT) | QP_TRIE_NON_ROOT_NODE_MASK; + if (!diff) + return s_key->len == n_key->len; + + *index += (diff & 0xf0) ? 0 : (1U << QP_TRIE_NIBBLE_SHIFT); + return false; +} + +static int qp_trie_new_branch(struct qp_trie *trie, struct qp_trie_branch __rcu **parent, + unsigned int bitmap, void *sibling, struct qp_trie_diff *d, + const struct qp_trie_key *leaf) +{ + struct qp_trie_branch *new_child_twigs, *new_twigs, *old_twigs; + struct bpf_map *map; + unsigned int iip; + int err; + + map = &trie->map; + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + new_child_twigs = qp_trie_branch_new(map, 2); + if (!new_child_twigs) { + err = -ENOMEM; + goto dec_entries; + } + + new_child_twigs->index = d->index; + new_child_twigs->bitmap = d->sibling_bm | d->new_bm; + + iip = calc_twig_index(new_child_twigs->bitmap, d->sibling_bm); + RCU_INIT_POINTER(new_child_twigs->nodes[iip], sibling); + rcu_assign_pointer(new_child_twigs->nodes[!iip], to_child_node(leaf)); + RCU_INIT_POINTER(new_child_twigs->parent, NULL); + + old_twigs = rcu_dereference_protected(*parent, 1); + new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, new_child_twigs); + if (!new_twigs) { + err = -ENOMEM; + goto free_child_twigs; + } + + qp_trie_assign_parent(new_child_twigs, sibling); + rcu_assign_pointer(*parent, new_twigs); + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + + return 0; + +free_child_twigs: + kfree(new_child_twigs); +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_ext_branch(struct qp_trie *trie, struct qp_trie_branch __rcu **parent, + const struct qp_trie_key *new, unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_map *map; + int err; + + map = &trie->map; + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + old_twigs = rcu_dereference_protected(*parent, 1); + new_twigs = qp_trie_branch_insert(map, old_twigs, bitmap, new); + if (!new_twigs) { + err = -ENOMEM; + goto dec_entries; + } + + rcu_assign_pointer(*parent, new_twigs); + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + + return 0; + +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_add_leaf_node(struct qp_trie *trie, struct qp_trie_branch __rcu **parent, + const struct qp_trie_key *new) +{ + struct bpf_map *map = &trie->map; + struct qp_trie_branch *twigs; + int err; + + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + twigs = qp_trie_branch_new(map, 1); + if (!twigs) { + err = -ENOMEM; + goto dec_entries; + } + twigs->index = QP_TRIE_ROOT_NODE_INDEX; + twigs->bitmap = 1; + RCU_INIT_POINTER(twigs->parent, NULL); + rcu_assign_pointer(twigs->nodes[0], to_child_node(new)); + + rcu_assign_pointer(*parent, twigs); + + return 0; +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_rep_leaf_node(struct qp_trie *trie, struct qp_trie_branch __rcu **parent, + const struct qp_trie_key *new, unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_map *map = &trie->map; + + /* Only branch node is freed by RCU, so replace the old branch node + * and free the old leaf node together with the old branch node. + */ + old_twigs = rcu_dereference_protected(*parent, 1); + new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, to_child_node(new)); + if (!new_twigs) + return -ENOMEM; + + rcu_assign_pointer(*parent, new_twigs); + + qp_trie_branch_free(old_twigs, calc_twig_index(old_twigs->bitmap, bitmap)); + + return 0; +} + +static int qp_trie_remove_leaf(struct qp_trie *trie, struct qp_trie_branch __rcu **parent, + unsigned int bitmap, const struct qp_trie_key *node) +{ + struct bpf_map *map = &trie->map; + struct qp_trie_branch *new, *old; + unsigned int nr; + + old = rcu_dereference_protected(*parent, 1); + nr = calc_twig_nr(old->bitmap); + if (nr > 2) { + new = qp_trie_branch_remove(map, old, bitmap); + if (!new) + return -ENOMEM; + } else { + new = NULL; + } + + rcu_assign_pointer(*parent, new); + + qp_trie_branch_free(old, calc_twig_index(old->bitmap, bitmap)); + + atomic_dec(&trie->entries); + + return 0; +} + +static int qp_trie_merge_node(struct qp_trie *trie, struct qp_trie_branch __rcu **grand_parent, + struct qp_trie_branch *parent, unsigned int parent_bitmap, + unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_map *map = &trie->map; + void *new_sibling; + unsigned int iip; + + iip = calc_twig_index(parent->bitmap, bitmap); + new_sibling = rcu_dereference_protected(parent->nodes[!iip], 1); + + old_twigs = rcu_dereference_protected(*grand_parent, 1); + new_twigs = qp_trie_branch_replace(map, old_twigs, parent_bitmap, new_sibling); + if (!new_twigs) + return -ENOMEM; + + rcu_assign_pointer(*grand_parent, new_twigs); + + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + qp_trie_branch_free(parent, iip); + + atomic_dec(&trie->entries); + + return 0; +} + +static int qp_trie_alloc_check(union bpf_attr *attr) +{ + if (!bpf_capable()) + return -EPERM; + + if ((attr->map_flags & QP_TRIE_MANDATORY_FLAG_MASK) != QP_TRIE_MANDATORY_FLAG_MASK || + attr->map_flags & ~QP_TRIE_CREATE_FLAG_MASK || + !bpf_map_flags_access_ok(attr->map_flags)) + return -EINVAL; + + if (!attr->max_entries || !attr->value_size) + return -EINVAL; + + /* Key and value are allocated together in qp_trie_init_leaf_node() */ + if (round_up((u64)sizeof(struct qp_trie_key) + (u32)attr->map_extra + attr->value_size, + QP_TRIE_LEAF_ALLOC_ALIGN) >= KMALLOC_MAX_SIZE) + return -E2BIG; + + return 0; +} + +static struct bpf_map *qp_trie_alloc(union bpf_attr *attr) +{ + struct qp_trie *trie; + unsigned int i; + + trie = bpf_map_area_alloc(sizeof(*trie), bpf_map_attr_numa_node(attr)); + if (!trie) + return ERR_PTR(-ENOMEM); + + /* roots are zeroed by bpf_map_area_alloc() */ + for (i = 0; i < QP_TRIE_NR_SUBTREE; i++) + spin_lock_init(&trie->locks[i]); + + atomic_set(&trie->entries, 0); + bpf_map_init_from_attr(&trie->map, attr); + + return &trie->map; +} + +static void qp_trie_free_subtree(void *root) +{ + struct qp_trie_branch *parent = NULL; + struct qp_trie_key *cur = NULL; + void *node = root; + + /* + * Depth-first deletion + * + * 1. find left-most key and its parent + * 2. get next sibling Y from parent + * (a) Y is leaf node: continue + * (b) Y is branch node: goto step 1 + * (c) no more sibling: backtrace upwards if parent is not NULL and + * goto step 1 + */ + do { + while (is_branch_node(node)) { + parent = node; + node = rcu_dereference_raw(parent->nodes[0]); + } + + cur = to_leaf_node(node); + while (parent) { + unsigned int iip, bitmap, nr; + void *ancestor; + + bitmap = calc_br_bitmap(parent->index, cur->data, cur->len); + iip = calc_twig_index(parent->bitmap, bitmap) + 1; + nr = calc_twig_nr(parent->bitmap); + + for (; iip < nr; iip++) { + kfree(cur); + + node = rcu_dereference_raw(parent->nodes[iip]); + if (is_branch_node(node)) + break; + + cur = to_leaf_node(node); + } + if (iip < nr) + break; + + ancestor = rcu_dereference_raw(parent->parent); + kfree(parent); + parent = ancestor; + } + } while (parent); + + kfree(cur); +} + +static void qp_trie_free(struct bpf_map *map) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + unsigned int i; + + /* Wait for the pending qp_trie_free_twigs_rcu() */ + rcu_barrier(); + + for (i = 0; i < ARRAY_SIZE(trie->roots); i++) { + void *root = rcu_dereference_raw(trie->roots[i]); + + if (root) + qp_trie_free_subtree(root); + } + bpf_map_area_free(trie); +} + +static inline void qp_trie_copy_leaf(const struct qp_trie_key *leaf, struct bpf_dynptr_kern *key) +{ + memcpy(key->data + key->offset, leaf->data, leaf->len); + bpf_dynptr_set_size(key, leaf->len); +} + +static void qp_trie_copy_min_key_from(void *root, struct bpf_dynptr_kern *key) +{ + void *node; + + node = root; + while (is_branch_node(node)) + node = rcu_dereference(((struct qp_trie_branch *)node)->nodes[0]); + + qp_trie_copy_leaf(to_leaf_node(node), key); +} + +static int qp_trie_lookup_min_key(struct qp_trie *trie, unsigned int from, + struct bpf_dynptr_kern *key) +{ + unsigned int i; + + for (i = from; i < ARRAY_SIZE(trie->roots); i++) { + void *root = rcu_dereference(trie->roots[i]); + + if (root) { + qp_trie_copy_min_key_from(root, key); + return 0; + } + } + + return -ENOENT; +} + +static int qp_trie_next_twigs_index(struct qp_trie_branch *twigs, unsigned int bitmap) +{ + unsigned int idx, nr, next; + + /* bitmap may not in twigs->bitmap */ + idx = calc_twig_index(twigs->bitmap, bitmap); + nr = calc_twig_nr(twigs->bitmap); + + next = idx; + if (twigs->bitmap & bitmap) + next += 1; + + if (next >= nr) + return -1; + return next; +} + +static int qp_trie_lookup_next_node(struct qp_trie *trie, const struct bpf_dynptr_kern *key, + struct bpf_dynptr_kern *next_key) +{ + const struct qp_trie_key *found; + struct qp_trie_branch *parent; + const unsigned char *data; + unsigned int data_len; + void *node, *next; + + /* Non-existent key, so restart from the beginning */ + data = key->data + key->offset; + node = rcu_dereference(trie->roots[*data]); + if (!node) + return qp_trie_lookup_min_key(trie, 0, next_key); + + parent = NULL; + data_len = bpf_dynptr_get_size(key); + while (is_branch_node(node)) { + struct qp_trie_branch *br = node; + unsigned int iip, bitmap; + + bitmap = calc_br_bitmap(br->index, data, data_len); + if (bitmap & br->bitmap) + iip = calc_twig_index(br->bitmap, bitmap); + else + iip = 0; + + parent = br; + node = rcu_dereference(br->nodes[iip]); + } + found = to_leaf_node(node); + if (!is_same_key(found, data, data_len)) + return qp_trie_lookup_min_key(trie, 0, next_key); + + /* Pair with store release in rcu_assign_pointer(*parent, twigs) to + * ensure reading node->parent will not return the old parent if + * the node is found by following the newly-created parent. + */ + smp_rmb(); + + next = NULL; + while (parent) { + unsigned int bitmap; + int next_idx; + + bitmap = calc_br_bitmap(parent->index, data, data_len); + next_idx = qp_trie_next_twigs_index(parent, bitmap); + if (next_idx >= 0) { + next = rcu_dereference(parent->nodes[next_idx]); + break; + } + parent = rcu_dereference(parent->parent); + } + + /* Goto next sub-tree */ + if (!next) + return qp_trie_lookup_min_key(trie, *data + 1, next_key); + + if (!is_branch_node(next)) + qp_trie_copy_leaf(to_leaf_node(next), next_key); + else + qp_trie_copy_min_key_from(next, next_key); + + return 0; +} + +/* Called from syscall */ +static int qp_trie_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + int err; + + if (!key) + err = qp_trie_lookup_min_key(trie, 0, next_key); + else + err = qp_trie_lookup_next_node(trie, key, next_key); + return err; +} + +/* Called from syscall or from eBPF program */ +static void *qp_trie_lookup_elem(struct bpf_map *map, void *key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + const struct bpf_dynptr_kern *dynptr_key = key; + const struct qp_trie_key *found; + const unsigned char *data; + unsigned int data_len; + void *node, *value; + + /* Dynptr with zero length is possible, but is invalid for qp-trie */ + data_len = bpf_dynptr_get_size(dynptr_key); + if (!data_len) + return NULL; + + data = dynptr_key->data + dynptr_key->offset; + node = rcu_dereference_check(trie->roots[*data], rcu_read_lock_bh_held()); + if (!node) + return NULL; + + value = NULL; + while (is_branch_node(node)) { + struct qp_trie_branch *br = node; + unsigned int bitmap; + unsigned int iip; + + /* When byte index equals with key len, the target key + * may be in twigs->nodes[0]. + */ + if (index_to_byte_index(br->index) > data_len) + goto done; + + bitmap = calc_br_bitmap(br->index, data, data_len); + if (!(bitmap & br->bitmap)) + goto done; + + iip = calc_twig_index(br->bitmap, bitmap); + node = rcu_dereference_check(br->nodes[iip], rcu_read_lock_bh_held()); + } + + found = to_leaf_node(node); + if (is_same_key(found, data, data_len)) + value = qp_trie_leaf_value(found); +done: + return value; +} + +/* Called from syscall or from eBPF program */ +static int qp_trie_update_elem(struct bpf_map *map, void *key, void *value, u64 flags) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + const struct qp_trie_key *leaf_key, *new_key; + struct qp_trie_branch __rcu **parent; + struct qp_trie_diff d; + unsigned int bitmap; + void __rcu **node; + spinlock_t *lock; + unsigned char c; + bool equal; + int err; + + if (flags > BPF_EXIST) + return -EINVAL; + + /* The content of key may change, so copy it firstly */ + new_key = qp_trie_init_leaf_node(map, key, value); + if (IS_ERR(new_key)) + return PTR_ERR(new_key); + + c = new_key->data[0]; + lock = &trie->locks[c]; + spin_lock(lock); + parent = (struct qp_trie_branch __rcu **)&trie->roots[c]; + if (!rcu_dereference_protected(*parent, 1)) { + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_add_leaf_node(trie, parent, new_key); + goto unlock; + } + + bitmap = 1; + node = &rcu_dereference_protected(*parent, 1)->nodes[0]; + while (is_branch_node(rcu_dereference_protected(*node, 1))) { + struct qp_trie_branch *br = rcu_dereference_protected(*node, 1); + unsigned int iip; + + bitmap = calc_br_bitmap(br->index, new_key->data, new_key->len); + if (bitmap & br->bitmap) + iip = calc_twig_index(br->bitmap, bitmap); + else + iip = 0; + parent = (struct qp_trie_branch __rcu **)node; + node = &br->nodes[iip]; + } + + leaf_key = to_leaf_node(rcu_dereference_protected(*node, 1)); + equal = calc_prefix_len(leaf_key, new_key, &d.index); + if (equal) { + if (flags == BPF_NOEXIST) { + err = -EEXIST; + goto unlock; + } + err = qp_trie_rep_leaf_node(trie, parent, new_key, bitmap); + goto unlock; + } + + d.sibling_bm = calc_br_bitmap(d.index, leaf_key->data, leaf_key->len); + d.new_bm = calc_br_bitmap(d.index, new_key->data, new_key->len); + + bitmap = 1; + parent = (struct qp_trie_branch __rcu **)&trie->roots[c]; + node = &rcu_dereference_protected(*parent, 1)->nodes[0]; + while (is_branch_node(rcu_dereference_protected(*node, 1))) { + struct qp_trie_branch *br = rcu_dereference_protected(*node, 1); + unsigned int iip; + + if (d.index < br->index) + goto new_branch; + + parent = (struct qp_trie_branch __rcu **)node; + if (d.index == br->index) { + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_ext_branch(trie, parent, new_key, d.new_bm); + goto unlock; + } + + bitmap = calc_br_bitmap(br->index, new_key->data, new_key->len); + iip = calc_twig_index(br->bitmap, bitmap); + node = &br->nodes[iip]; + } + +new_branch: + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_new_branch(trie, parent, bitmap, rcu_dereference_protected(*node, 1), + &d, new_key); +unlock: + spin_unlock(lock); + if (err) + kfree(new_key); + return err; +} + +/* Called from syscall or from eBPF program */ +static int qp_trie_delete_elem(struct bpf_map *map, void *key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + unsigned int bitmap, parent_bitmap, data_len, nr; + struct qp_trie_branch __rcu **parent, **grand_parent; + const struct bpf_dynptr_kern *dynptr_key; + const struct qp_trie_key *found; + const unsigned char *data; + void __rcu **node; + spinlock_t *lock; + unsigned char c; + int err; + + dynptr_key = key; + data_len = bpf_dynptr_get_size(dynptr_key); + if (!data_len) + return -EINVAL; + + err = -ENOENT; + data = dynptr_key->data + dynptr_key->offset; + c = *data; + lock = &trie->locks[c]; + spin_lock(lock); + parent = (struct qp_trie_branch __rcu **)&trie->roots[c]; + if (!*parent) + goto unlock; + + grand_parent = NULL; + parent_bitmap = bitmap = 1; + node = &rcu_dereference_protected(*parent, 1)->nodes[0]; + while (is_branch_node(rcu_dereference_protected(*node, 1))) { + struct qp_trie_branch *br = rcu_dereference_protected(*node, 1); + unsigned int iip; + + if (index_to_byte_index(br->index) > data_len) + goto unlock; + + parent_bitmap = bitmap; + bitmap = calc_br_bitmap(br->index, data, data_len); + if (!(bitmap & br->bitmap)) + goto unlock; + + grand_parent = parent; + parent = (struct qp_trie_branch __rcu **)node; + iip = calc_twig_index(br->bitmap, bitmap); + node = &br->nodes[iip]; + } + + found = to_leaf_node(rcu_dereference_protected(*node, 1)); + if (!is_same_key(found, data, data_len)) + goto unlock; + + nr = calc_twig_nr(rcu_dereference_protected(*parent, 1)->bitmap); + if (nr != 2) + err = qp_trie_remove_leaf(trie, parent, bitmap, found); + else + err = qp_trie_merge_node(trie, grand_parent, rcu_dereference_protected(*parent, 1), + parent_bitmap, bitmap); +unlock: + spin_unlock(lock); + return err; +} + +static int qp_trie_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + return 0; +} + +BTF_ID_LIST_SINGLE(qp_trie_map_btf_ids, struct, qp_trie) +const struct bpf_map_ops qp_trie_map_ops = { + .map_alloc_check = qp_trie_alloc_check, + .map_alloc = qp_trie_alloc, + .map_free = qp_trie_free, + .map_get_next_key = qp_trie_get_next_key, + .map_lookup_elem = qp_trie_lookup_elem, + .map_update_elem = qp_trie_update_elem, + .map_delete_elem = qp_trie_delete_elem, + .map_meta_equal = bpf_map_meta_equal, + .map_check_btf = qp_trie_check_btf, + .map_btf_id = &qp_trie_map_btf_ids[0], +}; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 600c3fcee37a..1591f0df46f3 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -928,6 +928,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_QP_TRIE, }; /* Note that tracing related programs such as From patchwork Sat Sep 17 15:31:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979179 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6953C6FA91 for ; Sat, 17 Sep 2022 15:13:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229591AbiIQPNb (ORCPT ); Sat, 17 Sep 2022 11:13:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiIQPN3 (ORCPT ); Sat, 17 Sep 2022 11:13:29 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD7C6201A7 for ; Sat, 17 Sep 2022 08:13:28 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDtX1XrKzlClX for ; Sat, 17 Sep 2022 23:11:48 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S11; Sat, 17 Sep 2022 23:13:26 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 07/10] selftests/bpf: Add two new dynptr_fail cases for map key Date: Sat, 17 Sep 2022 23:31:22 +0800 Message-Id: <20220917153125.2001645-8-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S11 X-Coremail-Antispam: 1UD129KBjvJXoWxurWxKFWxJw1kAw4rWF45KFg_yoW5Ww4fpa 4rC3y2qr1SqFsrtryfG397ZF4Sgr1kXrW5GFs3tryYvrnrZr93ur1xKF17Xr9xKrs5Cr4a v3y2qF4ruayUX37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Now bpf_dynptr is also not usable for map key, so add two test cases to ensure that: one to directly use bpf_dynptr use as map key and another to use a struct with embedded bpf_dynptr as map key. Signed-off-by: Hou Tao --- .../testing/selftests/bpf/prog_tests/dynptr.c | 2 + .../testing/selftests/bpf/progs/dynptr_fail.c | 43 +++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c index bcf80b9f7c27..1df50fe1a741 100644 --- a/tools/testing/selftests/bpf/prog_tests/dynptr.c +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c @@ -20,6 +20,8 @@ static struct { {"ringbuf_invalid_api", "type=mem expected=alloc_mem"}, {"add_dynptr_to_map1", "invalid indirect read from stack"}, {"add_dynptr_to_map2", "invalid indirect read from stack"}, + {"add_dynptr_to_key1", "invalid indirect read from stack"}, + {"add_dynptr_to_key2", "invalid indirect read from stack"}, {"data_slice_out_of_bounds_ringbuf", "value is outside of the allowed memory range"}, {"data_slice_out_of_bounds_map_value", "value is outside of the allowed memory range"}, {"data_slice_use_after_release1", "invalid mem access 'scalar'"}, diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c index b0f08ff024fb..85cc32999e8e 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_fail.c +++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c @@ -35,6 +35,20 @@ struct { __type(value, __u32); } array_map3 SEC(".maps"); +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1); + __type(key, struct bpf_dynptr); + __type(value, __u32); +} hash_map1 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1); + __type(key, struct test_info); + __type(value, __u32); +} hash_map2 SEC(".maps"); + struct sample { int pid; long value; @@ -206,6 +220,35 @@ int add_dynptr_to_map2(void *ctx) return 0; } +/* Can't use a dynptr as key of normal map */ +SEC("?raw_tp") +int add_dynptr_to_key1(void *ctx) +{ + struct bpf_dynptr ptr; + + get_map_val_dynptr(&ptr); + bpf_map_lookup_elem(&hash_map1, &ptr); + + return 0; +} + +/* Can't use a struct with an embedded dynptr as key of normal map */ +SEC("?raw_tp") +int add_dynptr_to_key2(void *ctx) +{ + struct test_info x; + + x.x = 0; + bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &x.ptr); + + /* this should fail */ + bpf_map_lookup_elem(&hash_map2, &x); + + bpf_ringbuf_submit_dynptr(&x.ptr, 0); + + return 0; +} + /* A data slice can't be accessed out of bounds */ SEC("?raw_tp") int data_slice_out_of_bounds_ringbuf(void *ctx) From patchwork Sat Sep 17 15:31:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979181 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9708FC6FA94 for ; Sat, 17 Sep 2022 15:13:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229496AbiIQPNf (ORCPT ); Sat, 17 Sep 2022 11:13:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbiIQPNa (ORCPT ); Sat, 17 Sep 2022 11:13:30 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78F692D1CB for ; Sat, 17 Sep 2022 08:13:29 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDtX6SMdzl1km for ; Sat, 17 Sep 2022 23:11:48 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S12; Sat, 17 Sep 2022 23:13:27 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 08/10] selftests/bpf: Add test case for basic qp-trie map operations Date: Sat, 17 Sep 2022 23:31:23 +0800 Message-Id: <20220917153125.2001645-9-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S12 X-Coremail-Antispam: 1UD129KBjvJXoW3GFW8tr4DZFW8Kw18KFW5Awb_yoW3GFy3pa 95G34Utr4xXw1aqrySqa1SgrWrWa18Ww1UGF90g345Zr97XF9xWF1xKFy8tF1SyrZ0qrWf ZasxtF4rCw48J3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao First showing the lookup on qp-trie map is different with on hash-tab, because qp-trie map only uses the specified length in key buffer instead of the full key size, then checking update and deletion operations on qp-trie map work as expected by using bpf_dynptr. Also checking the zero-sized bpf_dynptr is rejected by map operations. Signed-off-by: Hou Tao --- tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../selftests/bpf/prog_tests/qp_trie_test.c | 91 +++++++++++++++ .../selftests/bpf/progs/qp_trie_test.c | 110 ++++++++++++++++++ 3 files changed, 202 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/qp_trie_test.c create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_test.c diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x index 168c5b287b5c..e18d6f5ffeef 100644 --- a/tools/testing/selftests/bpf/DENYLIST.s390x +++ b/tools/testing/selftests/bpf/DENYLIST.s390x @@ -71,3 +71,4 @@ cb_refs # expected error message unexpected err cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) tracing_struct # failed to auto-attach: -524 (trampoline) +qp_trie_test # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) diff --git a/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c b/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c new file mode 100644 index 000000000000..24d5719f4f7f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include +#include "qp_trie_test.skel.h" + +static int setup_maps(struct qp_trie_test *skel, char *name, unsigned int value) +{ +#define FILE_PATH_SIZE 64 + struct bpf_dynptr_user dynptr; + char raw[FILE_PATH_SIZE]; + char zero; + int fd, err; + + memset(raw, 0, sizeof(raw)); + strncpy(raw, name, sizeof(raw) - 1); + + fd = bpf_map__fd(skel->maps.trie); + /* Full path returned from d_path includes the trailing terminator */ + bpf_dynptr_user_init(name, strlen(name) + 1, &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + if (!ASSERT_OK(err, "trie add name")) + return -EINVAL; + + zero = 0; + bpf_dynptr_user_init(&zero, 1, &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + if (!ASSERT_OK(err, "trie add zero")) + return -EINVAL; + + fd = bpf_map__fd(skel->maps.htab); + err = bpf_map_update_elem(fd, raw, &value, BPF_NOEXIST); + if (!ASSERT_OK(err, "htab add")) + return -EINVAL; + + return 0; +} + +void test_qp_trie_test(void) +{ + char name[] = "/tmp/qp_trie_test"; + unsigned int value, new_value; + struct bpf_dynptr_user dynptr; + struct qp_trie_test *skel; + int err, fd; + char zero; + + skel = qp_trie_test__open(); + if (!ASSERT_OK_PTR(skel, "qp_trie_test__open()")) + return; + + err = qp_trie_test__load(skel); + if (!ASSERT_OK(err, "qp_trie_test__load()")) + goto out; + + value = time(NULL); + if (setup_maps(skel, name, value)) + goto out; + + skel->bss->pid = getpid(); + err = qp_trie_test__attach(skel); + if (!ASSERT_OK(err, "attach")) + goto out; + + fd = open(name, O_RDONLY | O_CREAT, 0644); + if (!ASSERT_GE(fd, 0, "open tmp file")) + goto out; + close(fd); + unlink(name); + + ASSERT_EQ(skel->bss->trie_value, value, "trie lookup str"); + ASSERT_EQ(skel->bss->htab_value, -1, "htab lookup bytes"); + ASSERT_FALSE(skel->bss->zero_sized_key_bad, "zero-sized key"); + + bpf_dynptr_user_init(name, strlen(name) + 1, &dynptr); + new_value = 0; + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.trie), &dynptr, &new_value); + ASSERT_OK(err, "lookup elem"); + ASSERT_EQ(new_value, value + 1, "check new value"); + + zero = 0; + bpf_dynptr_user_init(&zero, 1, &dynptr); + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.trie), &dynptr, &new_value); + ASSERT_EQ(err, -ENOENT, "lookup deleted elem"); + +out: + qp_trie_test__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/qp_trie_test.c b/tools/testing/selftests/bpf/progs/qp_trie_test.c new file mode 100644 index 000000000000..c7cf3dfae971 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/qp_trie_test.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct path { +} __attribute__((preserve_access_index)); + +struct file { + struct path f_path; +} __attribute__((preserve_access_index)); + +#define FILE_PATH_SIZE 64 + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 2); + __uint(key_size, 4); + __uint(value_size, FILE_PATH_SIZE); +} array SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_QP_TRIE); + __uint(max_entries, 2); + __type(key, struct bpf_dynptr); + __type(value, unsigned int); + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY); + __uint(map_extra, FILE_PATH_SIZE); +} trie SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1); + __uint(key_size, FILE_PATH_SIZE); + __uint(value_size, 4); +} htab SEC(".maps"); + +int pid = 0; +unsigned int trie_value = 0; +unsigned int htab_value = 0; +bool zero_sized_key_bad = false; + +SEC("fentry/filp_close") +int BPF_PROG(filp_close, struct file *filp) +{ + struct bpf_dynptr str_ptr, zero_ptr, zero_sized_ptr; + unsigned int new_value, *value; + int idx, len, err; + struct path *p; + char *raw; + + if (bpf_get_current_pid_tgid() >> 32 != pid) + return 0; + + idx = 0; + raw = bpf_map_lookup_elem(&array, &idx); + if (!raw) + return 0; + + p = &filp->f_path; + len = bpf_d_path(p, raw, FILE_PATH_SIZE); + if (len < 0 || len > FILE_PATH_SIZE) + return 0; + + bpf_dynptr_from_mem(raw, len, 0, &str_ptr); + value = bpf_map_lookup_elem(&trie, &str_ptr); + if (value) + trie_value = *value; + else + trie_value = -1; + + value = bpf_map_lookup_elem(&htab, raw); + if (value) + htab_value = *value; + else + htab_value = -1; + + /* Update qp_trie map */ + new_value = trie_value + 1; + bpf_map_update_elem(&trie, &str_ptr, &new_value, BPF_ANY); + + idx = 1; + raw = bpf_map_lookup_elem(&array, &idx); + if (!raw) + return 0; + bpf_dynptr_from_mem(raw, 1, 0, &zero_ptr); + bpf_map_delete_elem(&trie, &zero_ptr); + + /* Use zero-sized bpf_dynptr */ + bpf_dynptr_from_mem(raw, 0, 0, &zero_sized_ptr); + + value = bpf_map_lookup_elem(&trie, &zero_sized_ptr); + if (value) + zero_sized_key_bad = true; + err = bpf_map_update_elem(&trie, &zero_sized_ptr, &new_value, BPF_ANY); + if (err != -EINVAL) + zero_sized_key_bad = true; + err = bpf_map_delete_elem(&trie, &zero_sized_ptr); + if (err != -EINVAL) + zero_sized_key_bad = true; + + return 0; +} From patchwork Sat Sep 17 15:31:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979183 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48597C54EE9 for ; Sat, 17 Sep 2022 15:13:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229458AbiIQPNg (ORCPT ); Sat, 17 Sep 2022 11:13:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229601AbiIQPNc (ORCPT ); Sat, 17 Sep 2022 11:13:32 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB0972F65E for ; Sat, 17 Sep 2022 08:13:29 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDtY3pHzzlClr for ; Sat, 17 Sep 2022 23:11:49 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S13; Sat, 17 Sep 2022 23:13:28 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 09/10] selftests/bpf: Add benchmark for qp-trie map Date: Sat, 17 Sep 2022 23:31:24 +0800 Message-Id: <20220917153125.2001645-10-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S13 X-Coremail-Antispam: 1UD129KBjvAXoWfWF1UKw1fKw15uw18GF1xZrb_yoW5Ary3uo WfGF47tF4kGr1UZry5K3Z5WFyfZFZrWasxJ39avwnxXFyjyrs093ykCw4fCr12vFs3Jw1U ZFZ0qw1fJrW8KFn5n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYu7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF 0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87 Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU13l1DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add a benchmark for qp-trie map to compare lookup, update/delete performance and memory usage with hash table. When the content of keys are uniformly distributed and there are large differencies between key length, qp-trie will be dense and has low height, but the lookup overhead of htab increases due to unnecessary memory comparison, so the lookup performance of qp-trie will be much better than hash-table as shown below: Randomly-generated binary data (key size=255, max entries=16K, key length range:[1, 255]) htab lookup (1 thread) 4.968 ± 0.009M/s (drops 0.002 ± 0.000M/s mem 8.169 MiB) htab lookup (2 thread) 10.118 ± 0.010M/s (drops 0.007 ± 0.000M/s mem 8.169 MiB) htab lookup (4 thread) 20.084 ± 0.022M/s (drops 0.007 ± 0.000M/s mem 8.168 MiB) htab lookup (8 thread) 39.866 ± 0.047M/s (drops 0.010 ± 0.000M/s mem 8.168 MiB) htab lookup (16 thread) 79.412 ± 0.065M/s (drops 0.049 ± 0.000M/s mem 8.169 MiB) htab update (1 thread) 2.122 ± 0.021M/s (drops 0.000 ± 0.000M/s mem 8.169 MiB) htab update (2 thread) 4.248 ± 0.197M/s (drops 0.000 ± 0.000M/s mem 8.168 MiB) htab update (4 thread) 8.475 ± 0.348M/s (drops 0.000 ± 0.000M/s mem 8.180 MiB) htab update (8 thread) 16.725 ± 0.633M/s (drops 0.000 ± 0.000M/s mem 8.208 MiB) htab update (16 thread) 30.246 ± 0.611M/s (drops 0.000 ± 0.000M/s mem 8.190 MiB) qp-trie lookup (1 thread) 10.291 ± 0.007M/s (drops 0.004 ± 0.000M/s mem 4.899 MiB) qp-trie lookup (2 thread) 20.797 ± 0.009M/s (drops 0.006 ± 0.000M/s mem 4.879 MiB) qp-trie lookup (4 thread) 41.943 ± 0.019M/s (drops 0.015 ± 0.000M/s mem 4.262 MiB) qp-trie lookup (8 thread) 81.985 ± 0.032M/s (drops 0.025 ± 0.000M/s mem 4.215 MiB) qp-trie lookup (16 thread) 164.681 ± 0.051M/s (drops 0.050 ± 0.000M/s mem 4.261 MiB) qp-trie update (1 thread) 1.622 ± 0.016M/s (drops 0.000 ± 0.000M/s mem 4.918 MiB) qp-trie update (2 thread) 2.688 ± 0.021M/s (drops 0.000 ± 0.000M/s mem 4.874 MiB) qp-trie update (4 thread) 4.062 ± 0.128M/s (drops 0.000 ± 0.000M/s mem 4.218 MiB) qp-trie update (8 thread) 7.037 ± 0.247M/s (drops 0.000 ± 0.000M/s mem 4.900 MiB) qp-trie update (16 thread) 11.024 ± 0.295M/s (drops 0.000 ± 0.000M/s mem 4.830 MiB) For the strings in /proc/kallsyms, single-thread lookup performance is about ~27% slower compared with hash table. When number of threads increase, lookup performance is almost the same. But update and deletion performance of qp-trie are much worsed compared with hash table as shown below: Strings in /proc/kallsyms (key size=83, max entries=170958) htab lookup (1 thread) 5.686 ± 0.008M/s (drops 0.345 ± 0.002M/s mem 30.840 MiB) htab lookup (2 thread) 10.147 ± 0.067M/s (drops 0.616 ± 0.005M/s mem 30.841 MiB) htab lookup (4 thread) 16.503 ± 0.025M/s (drops 1.002 ± 0.004M/s mem 30.841 MiB) htab lookup (8 thread) 33.429 ± 0.146M/s (drops 2.028 ± 0.020M/s mem 30.848 MiB) htab lookup (16 thread) 67.249 ± 0.577M/s (drops 4.085 ± 0.032M/s mem 30.841 MiB) htab update (1 thread) 3.135 ± 0.355M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB) htab update (2 thread) 6.269 ± 0.686M/s (drops 0.000 ± 0.000M/s mem 30.841 MiB) htab update (4 thread) 11.607 ± 1.632M/s (drops 0.000 ± 0.000M/s mem 30.840 MiB) htab update (8 thread) 23.041 ± 0.806M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB) htab update (16 thread) 31.393 ± 0.307M/s (drops 0.000 ± 0.000M/s mem 30.835 MiB) qp-trie lookup (1 thread) 4.122 ± 0.010M/s (drops 0.250 ± 0.002M/s mem 30.108 MiB) qp-trie lookup (2 thread) 9.119 ± 0.057M/s (drops 0.554 ± 0.004M/s mem 17.422 MiB) qp-trie lookup (4 thread) 16.605 ± 0.032M/s (drops 1.008 ± 0.006M/s mem 17.203 MiB) qp-trie lookup (8 thread) 33.461 ± 0.058M/s (drops 2.032 ± 0.004M/s mem 16.977 MiB) qp-trie lookup (16 thread) 67.466 ± 0.145M/s (drops 4.097 ± 0.019M/s mem 17.452 MiB) qp-trie update (1 thread) 1.191 ± 0.093M/s (drops 0.000 ± 0.000M/s mem 17.170 MiB) qp-trie update (2 thread) 2.057 ± 0.041M/s (drops 0.000 ± 0.000M/s mem 17.058 MiB) qp-trie update (4 thread) 2.975 ± 0.035M/s (drops 0.000 ± 0.000M/s mem 17.411 MiB) qp-trie update (8 thread) 3.596 ± 0.031M/s (drops 0.000 ± 0.000M/s mem 17.110 MiB) qp-trie update (16 thread) 4.200 ± 0.048M/s (drops 0.000 ± 0.000M/s mem 17.228 MiB) For strings in BTF string section, the results are similar: Sorted strings in BTF string sections (key size=71, max entries=115980) htab lookup (1 thread) 6.990 ± 0.050M/s (drops 0.000 ± 0.000M/s mem 22.227 MiB) htab lookup (2 thread) 12.729 ± 0.059M/s (drops 0.000 ± 0.000M/s mem 22.224 MiB) htab lookup (4 thread) 21.202 ± 0.099M/s (drops 0.000 ± 0.000M/s mem 22.218 MiB) htab lookup (8 thread) 43.418 ± 0.169M/s (drops 0.000 ± 0.000M/s mem 22.225 MiB) htab lookup (16 thread) 88.745 ± 0.410M/s (drops 0.000 ± 0.000M/s mem 22.224 MiB) htab update (1 thread) 3.238 ± 0.271M/s (drops 0.000 ± 0.000M/s mem 22.228 MiB) htab update (2 thread) 6.483 ± 0.821M/s (drops 0.000 ± 0.000M/s mem 22.227 MiB) htab update (4 thread) 12.702 ± 0.924M/s (drops 0.000 ± 0.000M/s mem 22.226 MiB) htab update (8 thread) 22.167 ± 1.269M/s (drops 0.000 ± 0.000M/s mem 22.229 MiB) htab update (16 thread) 31.225 ± 0.475M/s (drops 0.000 ± 0.000M/s mem 22.239 MiB) qp-trie lookup (1 thread) 6.729 ± 0.006M/s (drops 0.000 ± 0.000M/s mem 11.335 MiB) qp-trie lookup (2 thread) 13.417 ± 0.010M/s (drops 0.000 ± 0.000M/s mem 11.287 MiB) qp-trie lookup (4 thread) 26.399 ± 0.043M/s (drops 0.000 ± 0.000M/s mem 11.111 MiB) qp-trie lookup (8 thread) 52.910 ± 0.049M/s (drops 0.000 ± 0.000M/s mem 11.131 MiB) qp-trie lookup (16 thread) 105.444 ± 0.064M/s (drops 0.000 ± 0.000M/s mem 11.060 MiB) qp-trie update (1 thread) 1.508 ± 0.102M/s (drops 0.000 ± 0.000M/s mem 10.979 MiB) qp-trie update (2 thread) 2.877 ± 0.034M/s (drops 0.000 ± 0.000M/s mem 10.843 MiB) qp-trie update (4 thread) 5.111 ± 0.083M/s (drops 0.000 ± 0.000M/s mem 10.938 MiB) qp-trie update (8 thread) 9.229 ± 0.046M/s (drops 0.000 ± 0.000M/s mem 11.042 MiB) qp-trie update (16 thread) 11.625 ± 0.147M/s (drops 0.000 ± 0.000M/s mem 10.930 MiB) Signed-off-by: Hou Tao --- tools/testing/selftests/bpf/Makefile | 5 +- tools/testing/selftests/bpf/bench.c | 10 + .../selftests/bpf/benchs/bench_qp_trie.c | 511 ++++++++++++++++++ .../selftests/bpf/benchs/run_bench_qp_trie.sh | 55 ++ .../selftests/bpf/progs/qp_trie_bench.c | 236 ++++++++ 5 files changed, 816 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_qp_trie.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_bench.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 5f42adddbbb0..42ea2bc91d3b 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -577,11 +577,13 @@ $(OUTPUT)/bench_strncmp.o: $(OUTPUT)/strncmp_bench.skel.h $(OUTPUT)/bench_bpf_hashmap_full_update.o: $(OUTPUT)/bpf_hashmap_full_update_bench.skel.h $(OUTPUT)/bench_local_storage.o: $(OUTPUT)/local_storage_bench.skel.h $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tasks_trace_bench.skel.h +$(OUTPUT)/bench_qp_trie.o: $(OUTPUT)/qp_trie_bench.skel.h $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) $(OUTPUT)/bench: LDLIBS += -lm $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(TESTING_HELPERS) \ $(TRACE_HELPERS) \ + $(CGROUP_HELPERS) \ $(OUTPUT)/bench_count.o \ $(OUTPUT)/bench_rename.o \ $(OUTPUT)/bench_trigger.o \ @@ -591,7 +593,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(OUTPUT)/bench_strncmp.o \ $(OUTPUT)/bench_bpf_hashmap_full_update.o \ $(OUTPUT)/bench_local_storage.o \ - $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o + $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o \ + $(OUTPUT)/bench_qp_trie.o $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index c1f20a147462..618f45fbe6e2 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -275,6 +275,7 @@ extern struct argp bench_bpf_loop_argp; extern struct argp bench_local_storage_argp; extern struct argp bench_local_storage_rcu_tasks_trace_argp; extern struct argp bench_strncmp_argp; +extern struct argp bench_qp_trie_argp; static const struct argp_child bench_parsers[] = { { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, @@ -284,6 +285,7 @@ static const struct argp_child bench_parsers[] = { { &bench_strncmp_argp, 0, "bpf_strncmp helper benchmark", 0 }, { &bench_local_storage_rcu_tasks_trace_argp, 0, "local_storage RCU Tasks Trace slowdown benchmark", 0 }, + { &bench_qp_trie_argp, 0, "qp-trie benchmark", 0 }, {}, }; @@ -490,6 +492,10 @@ extern const struct bench bench_local_storage_cache_seq_get; extern const struct bench bench_local_storage_cache_interleaved_get; extern const struct bench bench_local_storage_cache_hashmap_control; extern const struct bench bench_local_storage_tasks_trace; +extern const struct bench bench_htab_lookup; +extern const struct bench bench_qp_trie_lookup; +extern const struct bench bench_htab_update; +extern const struct bench bench_qp_trie_update; static const struct bench *benchs[] = { &bench_count_global, @@ -529,6 +535,10 @@ static const struct bench *benchs[] = { &bench_local_storage_cache_interleaved_get, &bench_local_storage_cache_hashmap_control, &bench_local_storage_tasks_trace, + &bench_htab_lookup, + &bench_qp_trie_lookup, + &bench_htab_update, + &bench_qp_trie_update, }; static void setup_benchmark() diff --git a/tools/testing/selftests/bpf/benchs/bench_qp_trie.c b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c new file mode 100644 index 000000000000..ea38e272b5e4 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c @@ -0,0 +1,511 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include "bench.h" +#include "bpf_util.h" +#include "cgroup_helpers.h" + +#include "qp_trie_bench.skel.h" + +enum { + FOR_HTAB = 0, + FOR_TRIE, +}; + +static struct qp_trie_ctx { + struct qp_trie_bench *skel; + int cgrp_dfd; + u64 map_slab_mem; +} ctx; + +static struct { + const char *file; + __u32 entries; +} args; + +struct qp_trie_key { + __u32 len; + unsigned char data[0]; +}; + +struct run_stat { + __u64 stats[2]; +}; + +enum { + ARG_DATA_FILE = 8001, + ARG_DATA_ENTRIES = 8002, +}; + +static const struct argp_option opts[] = { + { "file", ARG_DATA_FILE, "DATA-FILE", 0, "Set data file" }, + { "entries", ARG_DATA_ENTRIES, "DATA-ENTRIES", 0, "Set data entries" }, + {}, +}; + +static error_t qp_trie_parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case ARG_DATA_FILE: + args.file = strdup(arg); + break; + case ARG_DATA_ENTRIES: + args.entries = strtoul(arg, NULL, 10); + break; + default: + return ARGP_ERR_UNKNOWN; + } + + return 0; +} + +const struct argp bench_qp_trie_argp = { + .options = opts, + .parser = qp_trie_parse_arg, +}; + +static int parse_data_set(const char *name, struct qp_trie_key ***set, unsigned int *nr, + unsigned int *max_len) +{ +#define INT_MAX_DATA_SIZE 1024 + unsigned int i, nr_items, item_max_len; + char line[INT_MAX_DATA_SIZE + 1]; + struct qp_trie_key **items; + struct qp_trie_key *cur; + int err = 0; + FILE *file; + char *got; + + file = fopen(name, "rb"); + if (!file) { + fprintf(stderr, "open %s err %s\n", name, strerror(errno)); + return -1; + } + + got = fgets(line, sizeof(line), file); + if (!got) { + fprintf(stderr, "empty file ?\n"); + err = -1; + goto out; + } + if (sscanf(line, "%u", &nr_items) != 1) { + fprintf(stderr, "the first line must be the number of items\n"); + err = -1; + goto out; + } + + fprintf(stdout, "item %u\n", nr_items); + + items = (struct qp_trie_key **)calloc(nr_items, sizeof(*items) + INT_MAX_DATA_SIZE); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + i = 0; + item_max_len = 0; + cur = (void *)items + sizeof(*items) * nr_items; + while (true) { + unsigned int len; + + got = fgets(line, sizeof(line), file); + if (!got) { + if (!feof(file)) { + fprintf(stderr, "read file %s error\n", name); + err = -1; + } + break; + } + + len = strlen(got); + if (len && got[len - 1] == '\n') { + got[len - 1] = 0; + len -= 1; + } + if (!len) { + fprintf(stdout, "#%u empty line\n", i + 2); + continue; + } + + if (i >= nr_items) { + fprintf(stderr, "too many line in %s\n", name); + break; + } + + if (len > item_max_len) + item_max_len = len; + cur->len = len; + memcpy(cur->data, got, len); + items[i++] = cur; + cur = (void *)cur + INT_MAX_DATA_SIZE; + } + + if (!err) { + if (i != nr_items) + fprintf(stdout, "few lines in %s (exp %u got %u)\n", name, nr_items, i); + *nr = i; + *set = items; + *max_len = item_max_len; + } else { + free(items); + } + +out: + fclose(file); + return err; +} + +static int gen_data_set(struct qp_trie_key ***set, unsigned int *nr, unsigned int *max_len) +{ +#define RND_MAX_DATA_SIZE 255 + struct qp_trie_key **items; + size_t ptr_size, data_size; + struct qp_trie_key *cur; + unsigned int i, nr_items; + ssize_t got; + int err = 0; + + ptr_size = *nr * sizeof(*items); + data_size = *nr * (sizeof(*cur) + RND_MAX_DATA_SIZE); + items = (struct qp_trie_key **)malloc(ptr_size + data_size); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + cur = (void *)items + ptr_size; + got = syscall(__NR_getrandom, cur, data_size, 0); + if (got != data_size) { + fprintf(stderr, "getrandom error %s\n", strerror(errno)); + err = -1; + goto out; + } + + nr_items = 0; + for (i = 0; i < *nr; i++) { + cur->len &= 0xff; + if (cur->len) { + items[nr_items++] = cur; + memset(cur->data + cur->len, 0, RND_MAX_DATA_SIZE - cur->len); + } + cur = (void *)cur + (sizeof(*cur) + RND_MAX_DATA_SIZE); + } + if (!nr_items) { + fprintf(stderr, "no valid key in random data\n"); + err = -1; + goto out; + } + fprintf(stdout, "generate %u random keys\n", nr_items); + + *nr = nr_items; + *set = items; + *max_len = RND_MAX_DATA_SIZE; +out: + if (err && items) + free(items); + return err; +} + +static void qp_trie_validate(void) +{ + if (env.consumer_cnt != 1) { + fprintf(stderr, "qp_trie_map benchmark doesn't support multi-consumer!\n"); + exit(1); + } + + if (!args.file && !args.entries) { + fprintf(stderr, "must specify entries when use random generated data set\n"); + exit(1); + } + + if (args.file && access(args.file, R_OK)) { + fprintf(stderr, "data file is un-accessible\n"); + exit(1); + } +} + +static void qp_trie_init_map_opts(struct qp_trie_bench *skel, unsigned int data_size, + unsigned int nr) +{ + bpf_map__set_value_size(skel->maps.htab_array, data_size); + bpf_map__set_max_entries(skel->maps.htab_array, nr); + + bpf_map__set_key_size(skel->maps.htab, data_size); + bpf_map__set_max_entries(skel->maps.htab, nr); + + bpf_map__set_value_size(skel->maps.trie_array, sizeof(struct qp_trie_key) + data_size); + bpf_map__set_max_entries(skel->maps.trie_array, nr); + + bpf_map__set_map_extra(skel->maps.qp_trie, data_size); + bpf_map__set_max_entries(skel->maps.qp_trie, nr); +} + +static void qp_trie_setup_key_map(struct bpf_map *map, unsigned int map_type, + struct qp_trie_key **set, unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + void *value; + int err; + + value = (map_type != FOR_HTAB) ? (void *)set[i] : (void *)set[i]->data; + err = bpf_map_update_elem(fd, &i, value, 0); + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static u64 qp_trie_get_slab_mem(int dfd) +{ + const char *magic = "slab "; + const char *name = "memory.stat"; + int fd; + ssize_t nr; + char buf[4096]; + char *from; + + fd = openat(dfd, name, 0); + if (fd < 0) { + fprintf(stderr, "no %s\n", name); + exit(1); + } + + nr = read(fd, buf, sizeof(buf)); + if (nr <= 0) { + fprintf(stderr, "empty %s ?\n", name); + exit(1); + } + buf[nr - 1] = 0; + + close(fd); + + from = strstr(buf, magic); + if (!from) { + fprintf(stderr, "no slab in %s\n", name); + exit(1); + } + + return strtoull(from + strlen(magic), NULL, 10); +} + +static void qp_trie_setup_lookup_map(struct bpf_map *map, unsigned int map_type, + struct qp_trie_key **set, unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + int err; + + if (map_type == FOR_HTAB) { + void *key; + + key = set[i]->data; + err = bpf_map_update_elem(fd, key, &i, 0); + } else { + struct bpf_dynptr_user dynptr; + + bpf_dynptr_user_init(set[i]->data, set[i]->len, &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &i, 0); + } + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static void qp_trie_setup(unsigned int map_type) +{ + struct qp_trie_key **set = NULL; + struct qp_trie_bench *skel; + unsigned int nr = 0, max_len = 0; + struct bpf_map *map; + u64 before, after; + int dfd; + int err; + + if (!args.file) { + nr = args.entries; + err = gen_data_set(&set, &nr, &max_len); + } else { + err = parse_data_set(args.file, &set, &nr, &max_len); + } + if (err < 0) + exit(1); + + if (args.entries && args.entries < nr) + nr = args.entries; + + dfd = cgroup_setup_and_join("/qp_trie"); + if (dfd < 0) { + fprintf(stderr, "failed to setup cgroup env\n"); + exit(1); + } + + setup_libbpf(); + + before = qp_trie_get_slab_mem(dfd); + + skel = qp_trie_bench__open(); + if (!skel) { + fprintf(stderr, "failed to open skeleton\n"); + exit(1); + } + + qp_trie_init_map_opts(skel, max_len, nr); + + skel->rodata->qp_trie_key_size = max_len; + skel->bss->update_nr = nr; + skel->bss->update_chunk = nr / env.producer_cnt; + + err = qp_trie_bench__load(skel); + if (err) { + fprintf(stderr, "failed to load skeleton\n"); + exit(1); + } + + map = (map_type == FOR_HTAB) ? skel->maps.htab_array : skel->maps.trie_array; + qp_trie_setup_key_map(map, map_type, set, nr); + + map = (map_type == FOR_HTAB) ? skel->maps.htab : skel->maps.qp_trie; + qp_trie_setup_lookup_map(map, map_type, set, nr); + + after = qp_trie_get_slab_mem(dfd); + + ctx.skel = skel; + ctx.cgrp_dfd = dfd; + ctx.map_slab_mem = after - before; +} + +static void qp_trie_attach_prog(struct bpf_program *prog) +{ + struct bpf_link *link; + + link = bpf_program__attach(prog); + if (!link) { + fprintf(stderr, "failed to attach program!\n"); + exit(1); + } +} + +static void htab_lookup_setup(void) +{ + qp_trie_setup(FOR_HTAB); + qp_trie_attach_prog(ctx.skel->progs.htab_lookup); +} + +static void qp_trie_lookup_setup(void) +{ + qp_trie_setup(FOR_TRIE); + qp_trie_attach_prog(ctx.skel->progs.qp_trie_lookup); +} + +static void htab_update_setup(void) +{ + qp_trie_setup(FOR_HTAB); + qp_trie_attach_prog(ctx.skel->progs.htab_update); +} + +static void qp_trie_update_setup(void) +{ + qp_trie_setup(FOR_TRIE); + qp_trie_attach_prog(ctx.skel->progs.qp_trie_update); +} + +static void *qp_trie_producer(void *ctx) +{ + while (true) + (void)syscall(__NR_getpgid); + return NULL; +} + +static void *qp_trie_consumer(void *ctx) +{ + return NULL; +} + +static void qp_trie_measure(struct bench_res *res) +{ + static __u64 last_hits, last_drops; + __u64 total_hits = 0, total_drops = 0; + unsigned int i, nr_cpus; + + nr_cpus = bpf_num_possible_cpus(); + for (i = 0; i < nr_cpus; i++) { + struct run_stat *s = (void *)&ctx.skel->bss->percpu_stats[i & 255]; + + total_hits += s->stats[0]; + total_drops += s->stats[1]; + } + + res->hits = total_hits - last_hits; + res->drops = total_drops - last_drops; + + last_hits = total_hits; + last_drops = total_drops; +} + +static void qp_trie_report_final(struct bench_res res[], int res_cnt) +{ + close(ctx.cgrp_dfd); + cleanup_cgroup_environment(); + + fprintf(stdout, "Slab: %.3f MiB\n", (float)ctx.map_slab_mem / 1024 / 1024); + hits_drops_report_final(res, res_cnt); +} + +const struct bench bench_htab_lookup = { + .name = "htab-lookup", + .validate = qp_trie_validate, + .setup = htab_lookup_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_qp_trie_lookup = { + .name = "qp-trie-lookup", + .validate = qp_trie_validate, + .setup = qp_trie_lookup_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_htab_update = { + .name = "htab-update", + .validate = qp_trie_validate, + .setup = htab_update_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_qp_trie_update = { + .name = "qp-trie-update", + .validate = qp_trie_validate, + .setup = qp_trie_update_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; diff --git a/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh new file mode 100755 index 000000000000..0cbcb5bc9292 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh @@ -0,0 +1,55 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2022. Huawei Technologies Co., Ltd + +source ./benchs/run_common.sh + +set -eufo pipefail + +mem() +{ + echo "$*" | sed -E "s/.*Slab: ([0-9]+\.[0-9]+ MiB).*/\1/" +} + +run_qp_trie_bench() +{ + local title=$1 + local summary + + shift 1 + summary=$($RUN_BENCH "$@" | grep "Summary\|Slab:") + printf "%s %20s (drops %-16s mem %s)\n" "$title" "$(hits $summary)" \ + "$(drops $summary)" "$(mem $summary)" +} + +run_qp_trie_benchs() +{ + local p + local m + local b + local title + + for m in htab qp-trie + do + for b in lookup update + do + for p in 1 2 4 8 16 + do + title=$(printf "%-16s (%-2d thread)" "$m $b" $p) + run_qp_trie_bench "$title" ${m}-${b} -p $p "$@" + done + done + done + echo +} + +echo "Randomly-generated binary data (16K)" +run_qp_trie_benchs --entries 16384 + +echo "Strings in /proc/kallsyms" +TMP_FILE=/tmp/kallsyms.txt +SRC_FILE=/proc/kallsyms +trap 'rm -f $TMP_FILE' EXIT +wc -l $SRC_FILE | awk '{ print $1}' > $TMP_FILE +awk '{ print $3 }' $SRC_FILE >> $TMP_FILE +run_qp_trie_benchs --file $TMP_FILE diff --git a/tools/testing/selftests/bpf/progs/qp_trie_bench.c b/tools/testing/selftests/bpf/progs/qp_trie_bench.c new file mode 100644 index 000000000000..b60acb7b9f94 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/qp_trie_bench.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +struct bpf_map; + +struct qp_trie_key { + __u32 len; + unsigned char data[0]; +}; + +/* value_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(key_size, 4); +} htab_array SEC(".maps"); + +/* value_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(key_size, 4); +} trie_array SEC(".maps"); + +/* key_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(value_size, 4); + __uint(map_flags, BPF_F_NO_PREALLOC); +} htab SEC(".maps"); + +/* map_extra will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_QP_TRIE); + __type(key, struct bpf_dynptr); + __type(value, unsigned int); + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY); +} qp_trie SEC(".maps"); + +char _license[] SEC("license") = "GPL"; + +struct { + __u64 stats[2]; +} __attribute__((__aligned__(128))) percpu_stats[256]; + +struct update_ctx { + unsigned int max; + unsigned int from; +}; + +volatile const unsigned int qp_trie_key_size; + +unsigned int update_nr; +unsigned int update_chunk; + +static __always_inline void update_stats(int idx) +{ + __u32 cpu = bpf_get_smp_processor_id(); + + percpu_stats[cpu & 255].stats[idx]++; +} + +static int lookup_htab(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + __u32 *index; + + index = bpf_map_lookup_elem(&htab, value); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int update_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&htab_array, &update->from); + if (!value) + return 1; + + err = bpf_map_update_elem(&htab, value, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&htab_array, &update->from); + if (!value) + return 1; + + err = bpf_map_delete_elem(&htab, value); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +static int lookup_qp_trie(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + struct qp_trie_key *qp_trie_key = value; + struct bpf_dynptr dynptr; + __u32 *index; + + if (qp_trie_key->len > qp_trie_key_size) + return 0; + + bpf_dynptr_from_mem(qp_trie_key->data, qp_trie_key->len, 0, &dynptr); + index = bpf_map_lookup_elem(&qp_trie, &dynptr); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int update_qp_trie_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + struct qp_trie_key *value; + struct bpf_dynptr dynptr; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&trie_array, &update->from); + if (!value || value->len > qp_trie_key_size) + return 1; + + bpf_dynptr_from_mem(value->data, value->len, 0, &dynptr); + err = bpf_map_update_elem(&qp_trie, &dynptr, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_qp_trie_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + struct qp_trie_key *value; + struct bpf_dynptr dynptr; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&trie_array, &update->from); + if (!value || value->len > qp_trie_key_size) + return 1; + + bpf_dynptr_from_mem(value->data, value->len, 0, &dynptr); + err = bpf_map_delete_elem(&qp_trie, &dynptr); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int htab_lookup(void *ctx) +{ + bpf_for_each_map_elem(&htab_array, lookup_htab, NULL, 0); + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int qp_trie_lookup(void *ctx) +{ + bpf_for_each_map_elem(&trie_array, lookup_qp_trie, NULL, 0); + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int htab_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_htab_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_htab_loop, &update, 0); + + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int qp_trie_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_qp_trie_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_qp_trie_loop, &update, 0); + + return 0; +} From patchwork Sat Sep 17 15:31:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12979184 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F4B0C6FA82 for ; Sat, 17 Sep 2022 15:13:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229473AbiIQPNg (ORCPT ); Sat, 17 Sep 2022 11:13:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbiIQPNd (ORCPT ); Sat, 17 Sep 2022 11:13:33 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B765A201A7 for ; Sat, 17 Sep 2022 08:13:30 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4MVDtZ1X6ZzlCm0 for ; Sat, 17 Sep 2022 23:11:50 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgAnenMO5CVjskryAw--.61987S14; Sat, 17 Sep 2022 23:13:28 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, Andrii Nakryiko Cc: Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Jiri Olsa , John Fastabend , Lorenz Bauer , "Paul E . McKenney" , houtao1@huawei.com Subject: [PATCH bpf-next 10/10] selftests/bpf: Add tests for qp-trie map by using bpf syscalls Date: Sat, 17 Sep 2022 23:31:25 +0800 Message-Id: <20220917153125.2001645-11-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220917153125.2001645-1-houtao@huaweicloud.com> References: <20220917153125.2001645-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAnenMO5CVjskryAw--.61987S14 X-Coremail-Antispam: 1UD129KBjvAXoWfuF47ZFWkuw4rKr17Ar18Krg_yoW5trykWo WfWrs0y34kGr1kAa4jgF1kuFWrWrW8Z3yqyr4Svwn8tF1DJry5ZayUGa13Cw4jga13Kr97 XasxZw1fWr9Ykr4fn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYu7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF 0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87 Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU13l1DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Add test cases for creation, lookup, update, deletion and iteration on qp-trie. Also checking the returned keys during iterations are ordered. Signed-off-by: Hou Tao --- .../selftests/bpf/map_tests/qp_trie_map.c | 1165 +++++++++++++++++ 1 file changed, 1165 insertions(+) create mode 100644 tools/testing/selftests/bpf/map_tests/qp_trie_map.c diff --git a/tools/testing/selftests/bpf/map_tests/qp_trie_map.c b/tools/testing/selftests/bpf/map_tests/qp_trie_map.c new file mode 100644 index 000000000000..27d0805c3049 --- /dev/null +++ b/tools/testing/selftests/bpf/map_tests/qp_trie_map.c @@ -0,0 +1,1165 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include + +#include "bpf_util.h" + +#define QP_TRIE_KEY_SIZE sizeof(struct bpf_dynptr) +#define QP_TRIE_DFT_MAX_KEY_LEN 4 +#define QP_TRIE_DFT_VAL_SIZE 4 +#define QP_TRIE_DFT_MAP_FLAGS (BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY) + +#define QP_TRIE_DFT_BTF_KEY_ID 1 +#define QP_TRIE_DFT_BTF_VAL_ID 2 + +struct qp_trie_create_case { + const char *name; + int error; + unsigned int map_flags; + unsigned int max_key_len; + unsigned int value_size; + unsigned int max_entries; + unsigned int btf_key_type_id; + unsigned int btf_value_type_id; +}; + +struct qp_trie_bytes_key { + unsigned int len; + unsigned char data[4]; +}; + +struct qp_trie_int_key { + unsigned int len; + unsigned int data; +}; + +enum { + UPDATE_OP = 0, + DELETE_OP, + LOOKUP_OP, + ITERATE_OP, + MAX_OP, +}; + +struct stress_conf { + unsigned int threads[MAX_OP]; + unsigned int max_key_len; + unsigned int loop; + unsigned int nr; +}; + +struct qp_trie_rw_ctx { + unsigned int nr; + unsigned int max_key_len; + int fd; + struct bpf_dynptr_user *set; + unsigned int loop; + unsigned int nr_delete; +}; + +static int qp_trie_load_btf(void) +{ + const char btf_str_sec[] = "\0bpf_dynptr\0qp_test_key"; + __u32 btf_raw_types[] = { + /* struct bpf_dynptr */ /* [1] */ + BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 0), 16), + /* unsigned int */ /* [2] */ + BTF_TYPE_INT_ENC(0, 0, 0, 32, 4), + /* struct qp_test_key */ /* [3] */ + BTF_TYPE_ENC(12, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 0), 16), + }; + struct btf_header btf_hdr = { + .magic = BTF_MAGIC, + .version = BTF_VERSION, + .hdr_len = sizeof(struct btf_header), + .type_len = sizeof(btf_raw_types), + .str_off = sizeof(btf_raw_types), + .str_len = sizeof(btf_str_sec), + }; + __u8 raw_btf[sizeof(struct btf_header) + sizeof(btf_raw_types) + + sizeof(btf_str_sec)]; + + memcpy(raw_btf, &btf_hdr, sizeof(btf_hdr)); + memcpy(raw_btf + sizeof(btf_hdr), btf_raw_types, sizeof(btf_raw_types)); + memcpy(raw_btf + sizeof(btf_hdr) + sizeof(btf_raw_types), + btf_str_sec, sizeof(btf_str_sec)); + + return bpf_btf_load(raw_btf, sizeof(raw_btf), NULL); +} + +struct qp_trie_create_case create_cases[] = { + { + .name = "tiny qp-trie", + .error = 0, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "empty qp-trie", + .error = -EINVAL, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 0, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "preallocated qp-trie", + .error = -EINVAL, + .map_flags = BPF_F_DYNPTR_KEY, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "fixed-size key qp-trie", + .error = -EINVAL, + .map_flags = BPF_F_NO_PREALLOC, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "mmapable qp-trie", + .error = -EINVAL, + .map_flags = QP_TRIE_DFT_MAP_FLAGS | BPF_F_MMAPABLE, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "no btf qp-trie", + .error = -EINVAL, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = 0, + .btf_value_type_id = 0, + }, + { + .name = "qp_test_key qp-trie", + .error = -EINVAL, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = 3, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "zero max key len qp-trie", + .error = -EINVAL, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = 0, + .value_size = QP_TRIE_DFT_VAL_SIZE, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, + { + .name = "big k-v size qp-trie", + .error = -E2BIG, + .map_flags = QP_TRIE_DFT_MAP_FLAGS, + .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN, + .value_size = 128 << 20, + .max_entries = 1, + .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID, + .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID, + }, +}; + +static void test_qp_trie_create(void) +{ + unsigned int i; + int btf_fd; + + btf_fd = qp_trie_load_btf(); + CHECK(btf_fd < 0, "load btf", "error %d\n", btf_fd); + + for (i = 0; i < ARRAY_SIZE(create_cases); i++) { + LIBBPF_OPTS(bpf_map_create_opts, opts); + int fd; + + opts.map_flags = create_cases[i].map_flags; + opts.btf_fd = btf_fd; + opts.btf_key_type_id = create_cases[i].btf_key_type_id; + opts.btf_value_type_id = create_cases[i].btf_value_type_id; + opts.map_extra = create_cases[i].max_key_len; + fd = bpf_map_create(BPF_MAP_TYPE_QP_TRIE, "qp_trie", QP_TRIE_KEY_SIZE, + create_cases[i].value_size, create_cases[i].max_entries, &opts); + if (!create_cases[i].error) { + CHECK(fd < 0, create_cases[i].name, "error %d\n", fd); + close(fd); + } else { + CHECK(fd != create_cases[i].error, create_cases[i].name, + "expect error %d got %d\n", create_cases[i].error, fd); + } + } + + close(btf_fd); +} + +static int qp_trie_create(unsigned int max_key_len, unsigned int value_size, unsigned int max_entries) +{ + LIBBPF_OPTS(bpf_map_create_opts, opts); + int btf_fd, map_fd; + + btf_fd = qp_trie_load_btf(); + CHECK(btf_fd < 0, "load btf", "error %d\n", btf_fd); + + opts.map_flags = QP_TRIE_DFT_MAP_FLAGS; + opts.btf_fd = btf_fd; + opts.btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID; + opts.btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID; + opts.map_extra = max_key_len; + map_fd = bpf_map_create(BPF_MAP_TYPE_QP_TRIE, "qp_trie", QP_TRIE_KEY_SIZE, value_size, + max_entries, &opts); + CHECK(map_fd < 0, "bpf_map_create", "error %d\n", map_fd); + + close(btf_fd); + + return map_fd; +} + +static void test_qp_trie_bad_update(void) +{ + struct bpf_dynptr_user dynptr; + unsigned int key, value; + u64 big_key; + int fd, err; + + fd = qp_trie_create(sizeof(key), sizeof(value), 1); + + /* Invalid flags (Error) */ + key = 0; + value = 0; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST | BPF_EXIST); + CHECK(err != -EINVAL, "invalid update flag", "error %d\n", err); + + /* Invalid key len (Error) */ + big_key = 1; + value = 1; + bpf_dynptr_user_init(&big_key, sizeof(big_key), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, 0); + CHECK(err != -EINVAL, "invalid data len", "error %d\n", err); + + /* Iterate an empty qp-trie (Error) */ + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_get_next_key(fd, NULL, &dynptr); + CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err); + + /* Overwrite an empty qp-trie (Error) */ + key = 2; + value = 2; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_EXIST); + CHECK(err != -ENOENT, "overwrite empty qp-trie", "error %d\n", err); + + /* Iterate an empty qp-trie (Error) */ + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_get_next_key(fd, NULL, &dynptr); + CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err); + + close(fd); +} + +static void test_qp_trie_bad_lookup_delete(void) +{ + struct bpf_dynptr_user dynptr; + unsigned int key, value; + int fd, err; + + fd = qp_trie_create(sizeof(key), sizeof(value), 2); + + /* Lookup/Delete non-existent key (Error) */ + key = 0; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err); + err = bpf_map_lookup_elem(fd, &dynptr, &value); + CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err); + + key = 0; + value = 2; + bpf_dynptr_user_init(&key, 2, &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "error %d\n", err); + + key = 0; + value = 4; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "error %d\n", err); + + /* + * Lookup/Delete non-existent key, although it is the prefix of + * existent keys (Error) + */ + key = 0; + bpf_dynptr_user_init(&key, 1, &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err); + err = bpf_map_lookup_elem(fd, &dynptr, &value); + CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err); + + /* Lookup/Delete non-existent key, although its prefix exists (Error) */ + key = 0; + bpf_dynptr_user_init(&key, 3, &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err); + err = bpf_map_lookup_elem(fd, &dynptr, &value); + CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err); + + close(fd); +} + +static int cmp_str(const void *a, const void *b) +{ + const char *str_a = *(const char **)a, *str_b = *(const char **)b; + + return strcmp(str_a, str_b); +} + +static void test_qp_trie_one_subtree_update(void) +{ + const char *keys[] = { + "ab", "abc", "abo", "abS", "abcd", + }; + const char *sorted_keys[ARRAY_SIZE(keys)]; + unsigned int value, got, i, j; + struct bpf_dynptr_user dynptr; + struct bpf_dynptr_user *cur; + char data[4]; + int fd, err; + + fd = qp_trie_create(4, sizeof(value), ARRAY_SIZE(keys)); + + for (i = 0; i < ARRAY_SIZE(keys); i++) { + unsigned int flags; + + /* Add i-th element */ + flags = i % 2 ? BPF_NOEXIST : 0; + bpf_dynptr_user_init((void *)keys[i], strlen(keys[i]), &dynptr); + value = i + 100; + err = bpf_map_update_elem(fd, &dynptr, &value, flags); + CHECK(err, "add elem", "#%u error %d\n", i, err); + + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err, "lookup elem", "#%u error %d\n", i, err); + CHECK(got != value, "lookup elem", "#%u expect %u got %u\n", i, value, got); + + /* Re-add i-th element (Error) */ + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err != -EEXIST, "re-add elem", "#%u error %d\n", i, err); + + /* Overwrite i-th element */ + flags = i % 2 ? 0 : BPF_EXIST; + value = i; + err = bpf_map_update_elem(fd, &dynptr, &value, flags); + CHECK(err, "update elem", "error %d\n", err); + + /* Lookup #[0~i] elements */ + for (j = 0; j <= i; j++) { + bpf_dynptr_user_init((void *)keys[j], strlen(keys[j]), &dynptr); + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err, "lookup elem", "#%u/%u error %d\n", i, j, err); + CHECK(got != j, "lookup elem", "#%u/%u expect %u got %u\n", i, j, value, got); + } + } + + /* Add element to a full qp-trie (Error) */ + memset(data, 0, sizeof(data)); + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + value = 0; + err = bpf_map_update_elem(fd, &dynptr, &value, 0); + CHECK(err != -ENOSPC, "add to full qp-trie", "error %d\n", err); + + /* Iterate sorted elements */ + cur = NULL; + memcpy(sorted_keys, keys, sizeof(keys)); + qsort(sorted_keys, ARRAY_SIZE(sorted_keys), sizeof(sorted_keys[0]), cmp_str); + bpf_dynptr_user_init(data, sizeof(data), &dynptr); + for (i = 0; i < ARRAY_SIZE(sorted_keys); i++) { + unsigned int len; + char *got; + + len = strlen(sorted_keys[i]); + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err, "iterate", "#%u error %d\n", i, err); + CHECK(bpf_dynptr_user_get_size(&dynptr) != len, "iterate", + "#%u invalid len %u expect %u\n", + i, bpf_dynptr_user_get_size(&dynptr), len); + got = bpf_dynptr_user_get_data(&dynptr); + CHECK(memcmp(sorted_keys[i], got, len), "iterate", + "#%u got %.*s exp %.*s\n", i, len, got, len, sorted_keys[i]); + + if (!cur) + cur = &dynptr; + } + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + /* Delete all elements */ + for (i = 0; i < ARRAY_SIZE(keys); i++) { + bpf_dynptr_user_init((void *)keys[i], strlen(keys[i]), &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err, "del elem", "#%u elem error %d\n", i, err); + + /* Lookup deleted element (Error) */ + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err != -ENOENT, "lookup elem", "#%u error %d\n", i, err); + + /* Lookup #(i~N] elements */ + for (j = i + 1; j < ARRAY_SIZE(keys); j++) { + bpf_dynptr_user_init((void *)keys[j], strlen(keys[j]), &dynptr); + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err, "lookup elem", "#%u/%u error %d\n", i, j, err); + CHECK(got != j, "lookup elem", "#%u/%u expect %u got %u\n", i, j, value, got); + } + } + + memset(data, 0, sizeof(data)); + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + err = bpf_map_get_next_key(fd, NULL, &dynptr); + CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err); + + close(fd); +} + +static void test_qp_trie_all_subtree_update(void) +{ + unsigned int i, max_entries, key, value, got; + struct bpf_dynptr_user dynptr; + struct bpf_dynptr_user *cur; + int fd, err; + + /* 16 elements per subtree */ + max_entries = 256 * 16; + fd = qp_trie_create(sizeof(key), sizeof(value), max_entries); + + for (i = 0; i < max_entries; i++) { + key = htole32(i); + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + value = i; + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "#%u error %d\n", i, err); + + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err, "lookup elem", "#%u elem error %d\n", i, err); + CHECK(got != value, "lookup elem", "#%u expect %u got %u\n", i, value, got); + } + + /* Add element to a full qp-trie (Error) */ + key = htole32(max_entries + 1); + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + value = 0; + err = bpf_map_update_elem(fd, &dynptr, &value, 0); + CHECK(err != -ENOSPC, "add to full qp-trie", "error %d\n", err); + + /* Iterate all elements */ + cur = NULL; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + for (i = 0; i < max_entries; i++) { + unsigned int *data; + unsigned int exp; + + exp = htole32((i / 16) | ((i & 0xf) << 8)); + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err, "iterate", "#%u error %d\n", i, err); + CHECK(bpf_dynptr_user_get_size(&dynptr) != 4, "iterate", + "#%u invalid len %u\n", i, bpf_dynptr_user_get_size(&dynptr)); + data = bpf_dynptr_user_get_data(&dynptr); + CHECK(data != &key, "dynptr data", "#%u got %p exp %p\n", i, data, &key); + CHECK(key != exp, "iterate", "#%u got %u exp %u\n", i, key, exp); + + if (!cur) + cur = &dynptr; + } + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + /* Delete all elements */ + i = max_entries; + while (i-- > 0) { + key = i; + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err, "del elem", "#%u error %d\n", i, err); + + /* Lookup deleted element (Error) */ + err = bpf_map_lookup_elem(fd, &dynptr, &got); + CHECK(err != -ENOENT, "lookup elem", "#%u error %d\n", i, err); + } + + bpf_dynptr_user_init(&key, sizeof(key), &dynptr); + err = bpf_map_get_next_key(fd, NULL, &dynptr); + CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err); + + close(fd); +} + +static int binary_insert_data(unsigned int *set, unsigned int nr, unsigned int data) +{ + int begin = 0, end = nr - 1, mid, i; + + while (begin <= end) { + mid = begin + (end - begin) / 2; + if (data == set[mid]) + return -1; + if (data > set[mid]) + begin = mid + 1; + else + end = mid - 1; + } + + /* Move [begin, nr) backwards and insert new item at begin */ + i = nr - 1; + while (i >= begin) { + set[i + 1] = set[i]; + i--; + } + set[begin] = data; + + return 0; +} + +/* UINT_MAX will not be in the returned data set */ +static unsigned int *gen_random_unique_data_set(unsigned int max_entries) +{ + unsigned int *data_set; + unsigned int i, data; + + data_set = malloc(sizeof(*data_set) * max_entries); + CHECK(!data_set, "malloc", "no mem"); + + for (i = 0; i < max_entries; i++) { + while (true) { + data = random() % UINT_MAX; + if (!binary_insert_data(data_set, i, data)) + break; + } + } + + return data_set; +} + +static int cmp_be32(const void *l, const void *r) +{ + unsigned int a = htobe32(*(unsigned int *)l), b = htobe32(*(unsigned int *)r); + + if (a < b) + return -1; + if (a > b) + return 1; + return 0; +} + +static void test_qp_trie_rdonly_iterate(void) +{ + unsigned int i, max_entries, value, data, len; + struct bpf_dynptr_user dynptr; + struct bpf_dynptr_user *cur; + unsigned int *data_set; + int fd, err; + + max_entries = 4096; + data_set = gen_random_unique_data_set(max_entries); + qsort(data_set, max_entries, sizeof(*data_set), cmp_be32); + + fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries); + value = 1; + for (i = 0; i < max_entries; i++) { + bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, 0); + CHECK(err, "add elem", "#%u error %d\n", i, err); + } + + /* Iteration results are big-endian ordered */ + cur = NULL; + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + for (i = 0; i < max_entries; i++) { + unsigned int *got; + + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err, "iterate", "#%u error %d\n", i, err); + + got = bpf_dynptr_user_get_data(&dynptr); + len = bpf_dynptr_user_get_size(&dynptr); + CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len); + CHECK(got != &data, "iterate", "#%u invalid dynptr got %p exp %p\n", i, got, &data); + CHECK(*got != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n", + i, *got, data_set[i]); + cur = &dynptr; + } + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + /* Iterate from non-existent key */ + data = htobe32(UINT_MAX); + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + err = bpf_map_get_next_key(fd, &dynptr, &dynptr); + CHECK(err, "iterate from non-existent", "error %d\n", err); + len = bpf_dynptr_user_get_size(&dynptr); + CHECK(len != 4, "iterate", "invalid len %u\n", len); + CHECK(data != data_set[0], "iterate", "got 0x%x exp 0x%x\n", + data, data_set[0]); + + free(data_set); + + close(fd); +} + +/* + * Delete current key (also the smallest key) after iteration, the next + * iteration will return the second smallest key, so the iteration result + * is still ordered. + */ +static void test_qp_trie_iterate_then_delete(void) +{ + unsigned int i, max_entries, value, data, len; + struct bpf_dynptr_user dynptr; + struct bpf_dynptr_user *cur; + unsigned int *data_set; + int fd, err; + + max_entries = 4096; + data_set = gen_random_unique_data_set(max_entries); + qsort(data_set, max_entries, sizeof(*data_set), cmp_be32); + + fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries); + value = 1; + for (i = 0; i < max_entries; i++) { + bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "#%u error %d\n", i, err); + } + + /* Iteration results are big-endian ordered */ + cur = NULL; + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + for (i = 0; i < max_entries; i++) { + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err, "iterate", "#%u error %d\n", i, err); + + len = bpf_dynptr_user_get_size(&dynptr); + CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len); + CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n", + i, data, data_set[i]); + cur = &dynptr; + + /* + * Delete the mininal key, next call of bpf_get_next_key() will + * return the second minimal key. + */ + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err, "del elem", "#%u elem error %d\n", i, err); + } + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + err = bpf_map_get_next_key(fd, NULL, &dynptr); + CHECK(err != -ENOENT, "no-empty qp-trie", "error %d\n", err); + + free(data_set); + + close(fd); +} + +/* The range is half-closed: [from, to) */ +static void delete_random_keys_in_range(int fd, unsigned int *data_set, + unsigned int from, unsigned int to) +{ + unsigned int del_from, del_to; + + if (from >= to) + return; + + del_from = random() % (to - from) + from; + del_to = random() % (to - del_from) + del_from; + for (; del_from <= del_to; del_from++) { + struct bpf_dynptr_user dynptr; + int err; + + /* Skip deleted keys */ + if (data_set[del_from] == UINT_MAX) + continue; + + bpf_dynptr_user_init(&data_set[del_from], sizeof(data_set[del_from]), &dynptr); + err = bpf_map_delete_elem(fd, &dynptr); + CHECK(err, "del elem", "#%u range %u-%u error %d\n", del_from, from, to, err); + data_set[del_from] = UINT_MAX; + } +} + +/* Delete keys randomly and ensure the iteration returns the expected data */ +static void test_qp_trie_iterate_then_batch_delete(void) +{ + unsigned int i, max_entries, value, data, len; + struct bpf_dynptr_user dynptr; + struct bpf_dynptr_user *cur; + unsigned int *data_set; + int fd, err; + + max_entries = 8192; + data_set = gen_random_unique_data_set(max_entries); + qsort(data_set, max_entries, sizeof(*data_set), cmp_be32); + + fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries); + value = 1; + for (i = 0; i < max_entries; i++) { + bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "#%u error %d\n", i, err); + } + + cur = NULL; + bpf_dynptr_user_init(&data, sizeof(data), &dynptr); + for (i = 0; i < max_entries; i++) { + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err, "iterate", "#%u error %d\n", i, err); + + len = bpf_dynptr_user_get_size(&dynptr); + CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len); + CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n", + i, data, data_set[i]); + cur = &dynptr; + + /* Delete some keys from iterated keys */ + delete_random_keys_in_range(fd, data_set, 0, i); + + /* Skip deleted keys */ + while (i + 1 < max_entries) { + if (data_set[i + 1] != UINT_MAX) + break; + i++; + } + + /* Delete some keys from to-iterate keys */ + delete_random_keys_in_range(fd, data_set, i + 1, max_entries); + + /* Skip deleted keys */ + while (i + 1 < max_entries) { + if (data_set[i + 1] != UINT_MAX) + break; + i++; + } + } + err = bpf_map_get_next_key(fd, cur, &dynptr); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + free(data_set); + + close(fd); +} + +/* + * Add keys with odd index first and add keys with even index during iteration. + * Check whether or not the whole key set is returned by iteration procedure. + */ +static void test_qp_trie_iterate_then_add(void) +{ + unsigned int i, max_entries, value, data, len; + struct bpf_dynptr_user dynptr, next_key; + struct bpf_dynptr_user *cur; + unsigned int *data_set; + int fd, err; + + max_entries = 8192; + data_set = gen_random_unique_data_set(max_entries); + qsort(data_set, max_entries, sizeof(*data_set), cmp_be32); + + fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries); + value = 1; + for (i = 0; i < max_entries; i++) { + if (i & 1) + continue; + + bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "#%u error %d\n", i, err); + } + + /* Iteration results are big-endian ordered */ + cur = NULL; + bpf_dynptr_user_init(&data, sizeof(data), &next_key); + for (i = 0; i < max_entries; i++) { + err = bpf_map_get_next_key(fd, cur, &next_key); + CHECK(err, "iterate", "#%u error %d\n", i, err); + + len = bpf_dynptr_user_get_size(&next_key); + CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len); + CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n", + i, data, data_set[i]); + cur = &next_key; + + if ((i & 1) || i + 1 >= max_entries) + continue; + + /* Add key with odd index which be returned in next iteration */ + bpf_dynptr_user_init(&data_set[i + 1], sizeof(data_set[i + 1]), &dynptr); + err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST); + CHECK(err, "add elem", "#%u error %d\n", i + 1, err); + } + err = bpf_map_get_next_key(fd, cur, &next_key); + CHECK(err != -ENOENT, "more element", "error %d\n", err); + + free(data_set); + + close(fd); +} + +static int get_int_from_env(const char *key, int dft) +{ + const char *value = getenv(key); + + if (!value) + return dft; + return atoi(value); +} + +static void free_bytes_set(struct bpf_dynptr_user *set, unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++) + free(bpf_dynptr_user_get_data(&set[i])); + free(set); +} + +struct bpf_dynptr_user *generate_random_bytes_set(unsigned int max_key_len, unsigned int nr) +{ + struct bpf_dynptr_user *set; + unsigned int i; + + set = malloc(nr * sizeof(*set)); + CHECK(!set, "malloc", "no mem for set"); + + for (i = 0; i < nr; i++) { + unsigned char *data; + unsigned int len, j; + + len = random() % max_key_len + 1; + data = malloc(len); + CHECK(!data, "maloc", "no mem for data"); + + j = 0; + while (j + 4 <= len) { + unsigned int rnd = random(); + + memcpy(&data[j], &rnd, sizeof(rnd)); + j += 4; + } + while (j < len) + data[j++] = random(); + + bpf_dynptr_user_init(data, len, &set[i]); + } + + return set; +} + +static struct bpf_dynptr_user *alloc_dynptr_user(unsigned int len) +{ + struct bpf_dynptr_user *dynptr; + + dynptr = malloc(sizeof(*dynptr) + len); + if (!dynptr) + return NULL; + + bpf_dynptr_user_init(&dynptr[1], len, dynptr); + + return dynptr; +} + +static int cmp_dynptr_user(const struct bpf_dynptr_user *a, const struct bpf_dynptr_user *b) +{ + unsigned int a_len = bpf_dynptr_user_get_size(a), b_len = bpf_dynptr_user_get_size(b); + unsigned int cmp = a_len < b_len ? a_len : b_len; + int ret; + + ret = memcmp(bpf_dynptr_user_get_data(a), bpf_dynptr_user_get_data(b), cmp); + if (ret) + return ret; + return a_len - b_len; +} + +static void dump_dynptr_user(const char *name, const struct bpf_dynptr_user *ptr) +{ + unsigned char *data = bpf_dynptr_user_get_data(ptr); + unsigned int i, len = bpf_dynptr_user_get_size(ptr); + + fprintf(stderr, "%s dynptr len %u data %p\n", name, len, data); + + for (i = 0; i < len; i++) { + fprintf(stderr, "%02x ", data[i]); + if (i % 16 == 15) + fprintf(stderr, "\n"); + } + fprintf(stderr, "\n"); +} + +static void copy_and_reset_dynptr_user(struct bpf_dynptr_user *dst_ptr, + struct bpf_dynptr_user *src_ptr, unsigned int reset_len) +{ + unsigned char *dst = bpf_dynptr_user_get_data(dst_ptr); + unsigned char *src = bpf_dynptr_user_get_data(src_ptr); + unsigned int src_len = bpf_dynptr_user_get_size(src_ptr); + + memcpy(dst, src, src_len); + bpf_dynptr_user_init(dst, src_len, dst_ptr); + bpf_dynptr_user_init(src, reset_len, src_ptr); +} + +static void *update_fn(void *arg) +{ + const struct qp_trie_rw_ctx *ctx = arg; + unsigned int i, j; + + for (i = 0; i < ctx->loop; i++) { + for (j = 0; j < ctx->nr; j++) { + unsigned int value; + int err; + + value = bpf_dynptr_user_get_size(&ctx->set[i]); + err = bpf_map_update_elem(ctx->fd, &ctx->set[i], &value, BPF_ANY); + if (err) { + fprintf(stderr, "update #%u element error %d\n", j, err); + return (void *)(long)err; + } + } + } + + return NULL; +} + +static void *delete_fn(void *arg) +{ + const struct qp_trie_rw_ctx *ctx = arg; + unsigned int i, j; + + for (i = 0; i < ctx->loop; i++) { + for (j = 0; j < ctx->nr; j++) { + int err; + + err = bpf_map_delete_elem(ctx->fd, &ctx->set[i]); + if (err && err != -ENOENT) { + fprintf(stderr, "delete #%u element error %d\n", j, err); + return (void *)(long)err; + } + } + } + + return NULL; +} + +static void *lookup_fn(void *arg) +{ + const struct qp_trie_rw_ctx *ctx = arg; + unsigned int i, j; + + for (i = 0; i < ctx->loop; i++) { + for (j = 0; j < ctx->nr; j++) { + unsigned int got, value; + int err; + + got = 0; + value = bpf_dynptr_user_get_size(&ctx->set[i]); + err = bpf_map_lookup_elem(ctx->fd, &ctx->set[i], &got); + if (!err && got != value) { + fprintf(stderr, "lookup #%u element got %u expected %u\n", j, got, value); + return (void *)(long)err; + } else if (err && err != -ENOENT) { + fprintf(stderr, "lookup #%u element error %d\n", j, err); + return (void *)(long)err; + } + } + } + + return NULL; +} + +static void *iterate_fn(void *arg) +{ + const struct qp_trie_rw_ctx *ctx = arg; + struct bpf_dynptr_user *key, *next_key; + unsigned int i; + int err; + + key = NULL; + next_key = alloc_dynptr_user(ctx->max_key_len); + if (!next_key) + return (void *)(long)-ENOMEM; + + err = 0; + for (i = 0; i < ctx->loop; i++) { + while (true) { + err = bpf_map_get_next_key(ctx->fd, key, next_key); + if (err < 0) { + if (err != -ENOENT) { + fprintf(stderr, "get key error %d\n", err); + goto out; + } + err = 0; + break; + } + + /* If no deletion, next key should be greater than key */ + if (!ctx->nr_delete && key && cmp_dynptr_user(key, next_key) >= 0) { + fprintf(stderr, "unordered iteration result\n"); + dump_dynptr_user("previous key", key); + dump_dynptr_user("cur key", next_key); + err = -EINVAL; + goto out; + } + + if (!key) { + key = alloc_dynptr_user(ctx->max_key_len); + if (!key) { + err = -ENOMEM; + goto out; + } + } + + /* Copy next_key to key, and reset next_key */ + copy_and_reset_dynptr_user(key, next_key, ctx->max_key_len); + } + + free(key); + key = NULL; + } + +out: + free(key); + free(next_key); + return (void *)(long)err; +} + +static void do_qp_trie_stress_test(const struct stress_conf *conf) +{ + void *(*fns[MAX_OP])(void *arg) = { + update_fn, delete_fn, lookup_fn, iterate_fn, + }; + unsigned int created[MAX_OP]; + struct qp_trie_rw_ctx ctx; + pthread_t *tids[MAX_OP]; + unsigned int op, i, err; + + ctx.nr = conf->nr; + ctx.max_key_len = conf->max_key_len; + ctx.fd = qp_trie_create(ctx.max_key_len, sizeof(unsigned int), ctx.nr); + ctx.set = generate_random_bytes_set(ctx.max_key_len, ctx.nr); + ctx.loop = conf->loop; + ctx.nr_delete = conf->threads[DELETE_OP]; + + /* Create threads */ + for (op = 0; op < ARRAY_SIZE(tids); op++) { + if (!conf->threads[op]) { + tids[op] = NULL; + continue; + } + + tids[op] = malloc(conf->threads[op] * sizeof(*tids[op])); + CHECK(!tids[op], "malloc", "no mem for op %u threads %u\n", op, conf->threads[op]); + } + + for (op = 0; op < ARRAY_SIZE(tids); op++) { + for (i = 0; i < conf->threads[op]; i++) { + err = pthread_create(&tids[op][i], NULL, fns[op], &ctx); + if (err) { + fprintf(stderr, "create #%u thread for op %u error %d\n", i, op, err); + break; + } + } + created[op] = i; + } + + err = 0; + for (op = 0; op < ARRAY_SIZE(tids); op++) { + for (i = 0; i < created[op]; i++) { + void *thread_err = NULL; + + pthread_join(tids[op][i], &thread_err); + if (thread_err) + err |= 1 << op; + } + } + CHECK(err, "stress operation", "err %u\n", err); + + for (op = 0; op < ARRAY_SIZE(tids); op++) + free(tids[op]); + free_bytes_set(ctx.set, ctx.nr); + close(ctx.fd); +} + +static void test_qp_trie_stress(void) +{ + struct stress_conf conf; + + memset(&conf, 0, sizeof(conf)); + + /* Test concurrently update, lookup and iterate operations. There is + * no deletion, so iteration can check the order of returned keys. + */ + conf.threads[UPDATE_OP] = get_int_from_env("QP_TRIE_NR_UPDATE", 8); + conf.threads[LOOKUP_OP] = get_int_from_env("QP_TRIE_NR_LOOKUP", 8); + conf.threads[ITERATE_OP] = get_int_from_env("QP_TRIE_NR_ITERATE", 8); + conf.max_key_len = get_int_from_env("QP_TRIE_MAX_KEY_LEN", 256); + conf.loop = get_int_from_env("QP_TRIE_NR_LOOP", 32); + conf.nr = get_int_from_env("QP_TRIE_NR_DATA", 8192); + do_qp_trie_stress_test(&conf); + + /* Add delete operation */ + conf.threads[DELETE_OP] = get_int_from_env("QP_TRIE_NR_DELETE", 8); + do_qp_trie_stress_test(&conf); +} + +void test_qp_trie_map(void) +{ + test_qp_trie_create(); + + test_qp_trie_bad_update(); + + test_qp_trie_bad_lookup_delete(); + + test_qp_trie_one_subtree_update(); + + test_qp_trie_all_subtree_update(); + + test_qp_trie_rdonly_iterate(); + + test_qp_trie_iterate_then_delete(); + + test_qp_trie_iterate_then_batch_delete(); + + test_qp_trie_iterate_then_add(); + + test_qp_trie_stress(); + + printf("%s:PASS\n", __func__); +}