From patchwork Tue Jan 7 08:55:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928455 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB55F1E0081; Tue, 7 Jan 2025 08:43:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; cv=none; b=tIbwe9GnWg4YAlCwNyXpYCPplpNMKi+AT3krRMlPvuhpJL6sM/5kdokDGOPVZocoXaeQWDuwdztBQsvX7Z/fG3DfmLmZUyUdb5MHGCVFpusVR88QyWalh/y7f+sPJGs/Y+NhgHP9qEjNmoMDFvjcVKNrK8u3laOv2bxLVKEMT4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; c=relaxed/simple; bh=lhnhxSv5zKbJCqmAxSQ/AntzPxJ+BQvoBoUQQpKr44g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oG82eQ0WTHR+ZOg1d2rlsCmVb9Jpi8mm/V3XINNlgC5TiIBqfvFFoBBlyqV425TaJY7qDM/9v6VWx63m3UQGykz6HQYbYvJUkuBzIQYagr/lTYTCO+BF1F64S11zX+lZV1gMxpqnwe5k3pEmGJq9gx+Ec0J2uR5HiBaLTQKdOeA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YS4MT3XNDz4f3jXK; Tue, 7 Jan 2025 16:43:33 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 9CFD11A166C; Tue, 7 Jan 2025 16:43:53 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S5; Tue, 07 Jan 2025 16:43:53 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 1/7] bpf: Free special fields after unlock in htab_lru_map_delete_node() Date: Tue, 7 Jan 2025 16:55:53 +0800 Message-Id: <20250107085559.3081563-2-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S5 X-Coremail-Antispam: 1UD129KBjvJXoW7KFyDGw4rKw48Gw1rtw1kKrg_yoW8GF1Dpa n5Gay3Ga1kZF1qkayrtF4vgry5Cw45Gw47KFW8GFyYy3W7JFyDW3W5KF93KFyaqrWkZrsa qrWqq3s8ta4UurDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPFb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU3cTm DUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to free the bpf_timer. If the timer is running on other CPUs and PREEMPT_RT is enabled, hrtimer_cancel will invoke hrtimer_cancel_wait_running() and it will try to acquire a spin-lock, however, htab_lru_map_delete_node() has already acquired a raw-spin-lock, it violates the lockdep rule and may trigger the "BUG: scheduling while atomic" warning. Fix the issue by moving the invocation of check_and_free_fields() out of bucket lock. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 007571ce4343..8fecf21c132a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -826,13 +826,14 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) if (l == tgt_l) { hlist_nulls_del_rcu(&l->hash_node); - check_and_free_fields(htab, l); bpf_map_dec_elem_count(&htab->map); break; } htab_unlock_bucket(htab, b, tgt_l->hash, flags); + if (l == tgt_l) + check_and_free_fields(htab, l); return l == tgt_l; } From patchwork Tue Jan 7 08:55:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928454 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B83211E0B91; Tue, 7 Jan 2025 08:43:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239441; cv=none; b=jIB9eWB/QaJmk2BIpf8r/Tj1x2rCQaV85+z+nM0xhIz2dJQZgnHsZAVEnRI7m/zSw5hxjHkIlgwodhe1NbD6ogyVp+wHObBqvvVxlv9xRLmYn2LMEw+ph0kWk3TLaBXMnSJvK3xyBGIwO+AsAVDferg0nVM0X/5G/zSCkAZFTCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239441; c=relaxed/simple; bh=YIRIy6EIg2ShCQZ2Aqk/7mvJenquahz9NqZBD/YsqfI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bjDvnJVQSEprg/hzRXq9CDDWmAxGMU2ePpZtjz66Am7uDh1TK290KWRfpv2k39SZYSIhQFQyw1++cyRqIxq9WKZG8iXRJDF8W3bk2kbOy29QGxYS7nxFczd+oXNlVMzGsM8nez9KOvGepN7zTwOBGspWqwyyOJMhjmqo2j9Hktg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YS4MV1Jc1z4f3jXw; Tue, 7 Jan 2025 16:43:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 5000E1A0E33; Tue, 7 Jan 2025 16:43:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S6; Tue, 07 Jan 2025 16:43:54 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 2/7] bpf: Bail out early in __htab_map_lookup_and_delete_elem() Date: Tue, 7 Jan 2025 16:55:54 +0800 Message-Id: <20250107085559.3081563-3-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S6 X-Coremail-Antispam: 1UD129KBjvJXoW7ZFW5tw1rXry8tw1fGw1xKrg_yoW5JFyrpF Z3Kry7Wry8ursIqa4ftw1jka95J340qw48Ka4DJFyrCF13Zryvqw13AF93GFy3Gr92yr4r trs2qF1fK3y2vrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP2b4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU0I3 85UUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely case. This change doesn't affect functionality and simply make the code cleaner. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 51 ++++++++++++++++++++++---------------------- 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8fecf21c132a..59eb117908c5 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1641,37 +1641,38 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, l = lookup_elem_raw(head, hash, key, key_size); if (!l) { ret = -ENOENT; - } else { - if (is_percpu) { - u32 roundup_value_size = round_up(map->value_size, 8); - void __percpu *pptr; - int off = 0, cpu; + goto out_unlock; + } - pptr = htab_elem_get_ptr(l, key_size); - for_each_possible_cpu(cpu) { - copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu)); - check_and_init_map_value(&htab->map, value + off); - off += roundup_value_size; - } - } else { - u32 roundup_key_size = round_up(map->key_size, 8); + if (is_percpu) { + u32 roundup_value_size = round_up(map->value_size, 8); + void __percpu *pptr; + int off = 0, cpu; - if (flags & BPF_F_LOCK) - copy_map_value_locked(map, value, l->key + - roundup_key_size, - true); - else - copy_map_value(map, value, l->key + - roundup_key_size); - /* Zeroing special fields in the temp buffer */ - check_and_init_map_value(map, value); + pptr = htab_elem_get_ptr(l, key_size); + for_each_possible_cpu(cpu) { + copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu)); + check_and_init_map_value(&htab->map, value + off); + off += roundup_value_size; } + } else { + u32 roundup_key_size = round_up(map->key_size, 8); - hlist_nulls_del_rcu(&l->hash_node); - if (!is_lru_map) - free_htab_elem(htab, l); + if (flags & BPF_F_LOCK) + copy_map_value_locked(map, value, l->key + + roundup_key_size, + true); + else + copy_map_value(map, value, l->key + + roundup_key_size); + /* Zeroing special fields in the temp buffer */ + check_and_init_map_value(map, value); } + hlist_nulls_del_rcu(&l->hash_node); + if (!is_lru_map) + free_htab_elem(htab, l); +out_unlock: htab_unlock_bucket(htab, b, hash, bflags); if (is_lru_map && l) From patchwork Tue Jan 7 08:55:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928456 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BE241E0DBD; Tue, 7 Jan 2025 08:43:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; cv=none; b=rSjOdmqbl2q8MvbzJgLCnnkEtv4m6ejorSbL/Ef5CGXWW4OTMGapoVdyPojsH0YSrlIiyFHOgLaqz9kEANabb8YBUugvd7I3kp8SrQhGtwZY8L/hHZtddmJmb5XX6Dx+Gs75JDDzwcVbEgzj6VUamkUhN3RRrMNhhIwJIt3Cj1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; c=relaxed/simple; bh=fOzFlEYNx94TpKuNiuVGI/gUQQ6pvir22fyOVxwPZpo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qkkrKkXZuMrkv6FXe3lYKxKu1HrWeWEv+c7ko0wZ4wPPJ/SxMoxshCEoGc+QA1NrJYCDD8SVJE1otroq+mfl+FT84Qf+M0FapwtaztkVSIxWU9XFq+1hhZkpupHC/3kLVin4uAlri7gYIDTPYozYBpIkovQbxNlkUeaEcAfzd88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YS4MT4cFYz4f3lDF; Tue, 7 Jan 2025 16:43:33 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id EBF1B1A1C82; Tue, 7 Jan 2025 16:43:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S7; Tue, 07 Jan 2025 16:43:54 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 3/7] bpf: Free element after unlock in __htab_map_lookup_and_delete_elem() Date: Tue, 7 Jan 2025 16:55:55 +0800 Message-Id: <20250107085559.3081563-4-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S7 X-Coremail-Antispam: 1UD129KBjvJXoW7AFy7JryfXrWDZw43tF4xJFb_yoW8Gry5pF Z5K3y2ga1kWrnYva4rJa1vkrWUGw1rXw1UGFykG34rtF1rWr97Gw1I9F9aqFyfZr9YyFZ5 XFW2yw45t3y5CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUP2b4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU04x RDUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem procedure has already held a raw-spin-lock, which violates the lockdep rule. The running context of __htab_map_lookup_and_delete_elem() has already disabled the migration. Therefore, it is OK to invoke free_htab_elem() after unlocking the bucket lock. Fix the potential problem by freeing element after unlocking bucket lock in __htab_map_lookup_and_delete_elem(). Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 59eb117908c5..903447a340d3 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1669,14 +1669,16 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, check_and_init_map_value(map, value); } hlist_nulls_del_rcu(&l->hash_node); - if (!is_lru_map) - free_htab_elem(htab, l); out_unlock: htab_unlock_bucket(htab, b, hash, bflags); - if (is_lru_map && l) - htab_lru_push_free(htab, l); + if (l) { + if (is_lru_map) + htab_lru_push_free(htab, l); + else + free_htab_elem(htab, l); + } return ret; } From patchwork Tue Jan 7 08:55:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928453 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C96D84CB5B; Tue, 7 Jan 2025 08:43:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239441; cv=none; b=fIb6HZtQz2Xywl1jQiV3529uPQxvwYw9/DQaUytgxhey7f7YmkwFfKLMCJROF1Jht8QUsM7xerxIKv7icj22hVDZGUgy9evWFJcASKVpz2RGw76MnhnCyqzlMfX1vxOP8WZgZzYdZJLfY7/b8DMD2VuL0eYlIVLhCvK7NxMsPCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239441; c=relaxed/simple; bh=/ES1x/948moWd3wtwnhkJbZlWu73lilloOUQA1GNehc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EzrU4jnaUaD8dF7pe71C3OUaYjKkX6PGPFcVLYHdmQzwBOmaKUrsjUsVmMn3YmSpsLzHv/FvDiRl41vt8fwbyNIp60Sgn06ihzmWa/BY2fzHhXTtqutpdAgKSLsKQve3JjX99gXhYLyOS244W4sOkPKE65l7gREpkqzaTIyDypU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YS4Mc4Rx5z4f3jqL; Tue, 7 Jan 2025 16:43:40 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 94BFD1A0C57; Tue, 7 Jan 2025 16:43:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S8; Tue, 07 Jan 2025 16:43:55 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 4/7] bpf: Support refilling extra_elems in free_htab_elem() Date: Tue, 7 Jan 2025 16:55:56 +0800 Message-Id: <20250107085559.3081563-5-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S8 X-Coremail-Antispam: 1UD129KBjvJXoWxXry8Gr1UJry8KF4kur1UAwb_yoW5tr4UpF WrGr47Kw48CrsI9w45Ja10krW5J34SqryjkFy8KFWrKF98Zrn3Gw42yFn7KryrCrykZasa qrZFvw45GayrGrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6r W5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF 7I0E14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x 07jIPfQUUUUU= X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The following patch will move the invocation of check_and_free_fields() in htab_map_update_elem() outside of the bucket lock. However, the reason why the bucket lock is necessary is that the overwritten element has already been stashed in htab->extra_elems when alloc_htab_elem() returns. If invoking check_and_free_fields() after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing in check_and_free_fields() will run concurrently with the reuse and lead to bugs. The fix breaks the reuse and stash of extra_elems into two steps: 1) reuse the per-cpu extra_elems with bucket lock being held. 2) refill per-cpu extra_elems after unlock bucket lock. This patch adds support for stashing per-cpu extra_elems after bucket lock is unlocked. The refill may run concurrently, therefore, cmpxchg_release() is used. _release semantics is necessary to ensure the freeing of ptrs or special fields in the map value is completed before the element is reused by concurrent update process. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 903447a340d3..3c6eebabb492 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -946,14 +946,28 @@ static void dec_elem_count(struct bpf_htab *htab) atomic_dec(&htab->count); } - -static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) +static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l, bool refill_extra) { htab_put_fd_value(htab, l); if (htab_is_prealloc(htab)) { - bpf_map_dec_elem_count(&htab->map); check_and_free_fields(htab, l); + + if (refill_extra) { + struct htab_elem **extra; + + /* Use cmpxchg_release() to ensure the freeing of ptrs + * or special fields in map value is completed when the + * update procedure reuses the extra element. It will + * pair with smp_load_acquire() when reading extra_elems + * pointer. + */ + extra = this_cpu_ptr(htab->extra_elems); + if (cmpxchg_release(extra, NULL, l) == NULL) + return; + } + + bpf_map_dec_elem_count(&htab->map); pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); @@ -1207,7 +1221,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); if (!htab_is_prealloc(htab)) - free_htab_elem(htab, l_old); + free_htab_elem(htab, l_old, false); } return 0; err: @@ -1461,7 +1475,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) htab_unlock_bucket(htab, b, hash, flags); if (l) - free_htab_elem(htab, l); + free_htab_elem(htab, l, false); return ret; } @@ -1677,7 +1691,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, if (is_lru_map) htab_lru_push_free(htab, l); else - free_htab_elem(htab, l); + free_htab_elem(htab, l, false); } return ret; @@ -1899,7 +1913,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, if (is_lru_map) htab_lru_push_free(htab, l); else - free_htab_elem(htab, l); + free_htab_elem(htab, l, false); } next_batch: From patchwork Tue Jan 7 08:55:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928458 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF6FE1DE4F6; Tue, 7 Jan 2025 08:43:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; cv=none; b=gm40np+9wX4iaBlKEET3rzq7V7Bu2sRl355NAy2xnM3UCE3pdKbJfXwqeh0V1l1H8iC1KSBgMsHCDR/OTnMdmteFyO+EQF96OEWX2Ry7pvCAGxXHreA2vszHunCeMDwaaCVIeXHdFFejAf/mGxDNP3bzbvs/tX9BTI9J4W5M2Pk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239442; c=relaxed/simple; bh=tCq9JeQOYviZ+IktZeEb6m1RHkXkUOA/QzN3mDP9HTA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qXQF30frZ07gdvqCxto7RE6jb0STYVRGa+Gtq2ifRoI1a/a8nDD2YHoouZl4hhmpp+ogTLBNC1WbhGL55CsvJ78mGCGupxB6VIgJjtxBkcKvXE9823YwMYqb54wIFYgr4kQhQpnqJ6yqqXlcx0l9d327iXhGheeUjiqBtT2KZ/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YS4MV6km4z4f3lCf; Tue, 7 Jan 2025 16:43:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 458CD1A1674; Tue, 7 Jan 2025 16:43:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S9; Tue, 07 Jan 2025 16:43:56 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 5/7] bpf: Factor out the element allocation for pre-allocated htab Date: Tue, 7 Jan 2025 16:55:57 +0800 Message-Id: <20250107085559.3081563-6-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S9 X-Coremail-Antispam: 1UD129KBjvJXoWxJF47Jr1kuw4DXw4fKFWDtwb_yoW5Wr4xpF WrGw1xAws5Xws2g395Xr409rW3Jrn5Gw4jk3y5Kw1Fyr98Crn7Gw47ZFyagry5Ary8CFyf X39Fva1rGa1Uu37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6r W5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUI-eODUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The element allocation for pre-allocated htab is composed of two logics: 1) when there is old_element, directly reuse per-cpu extra_elems Also stash the old_element as the next per-cpu extra_elems 2) when no old_element, allocate from per-cpu free list The reuse and stash of per-cpu extra_elems will be broken into two independent steps. After the breaking, per-cpu extra_elems may be NULL when trying to reuse it and the allocation needs to fall-back to per-cpu free list when it happens. Therefore, factor out the element allocation to a helper to make the following change be straightforward. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 49 +++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 3c6eebabb492..9211df2adda4 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1021,6 +1021,34 @@ static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) BITS_PER_LONG == 64; } +static struct htab_elem *alloc_preallocated_htab_elem(struct bpf_htab *htab, + struct htab_elem *old_elem) +{ + struct pcpu_freelist_node *l; + struct htab_elem *l_new; + + if (old_elem) { + struct htab_elem **pl_new; + + /* if we're updating the existing element, + * use per-cpu extra elems to avoid freelist_pop/push + */ + pl_new = this_cpu_ptr(htab->extra_elems); + l_new = *pl_new; + *pl_new = old_elem; + return l_new; + } + + l = __pcpu_freelist_pop(&htab->freelist); + if (!l) + return ERR_PTR(-E2BIG); + + l_new = container_of(l, struct htab_elem, fnode); + bpf_map_inc_elem_count(&htab->map); + + return l_new; +} + static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, void *value, u32 key_size, u32 hash, bool percpu, bool onallcpus, @@ -1028,26 +1056,13 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, { u32 size = htab->map.value_size; bool prealloc = htab_is_prealloc(htab); - struct htab_elem *l_new, **pl_new; + struct htab_elem *l_new; void __percpu *pptr; if (prealloc) { - if (old_elem) { - /* if we're updating the existing element, - * use per-cpu extra elems to avoid freelist_pop/push - */ - pl_new = this_cpu_ptr(htab->extra_elems); - l_new = *pl_new; - *pl_new = old_elem; - } else { - struct pcpu_freelist_node *l; - - l = __pcpu_freelist_pop(&htab->freelist); - if (!l) - return ERR_PTR(-E2BIG); - l_new = container_of(l, struct htab_elem, fnode); - bpf_map_inc_elem_count(&htab->map); - } + l_new = alloc_preallocated_htab_elem(htab, old_elem); + if (IS_ERR(l_new)) + return l_new; } else { if (is_map_full(htab)) if (!old_elem) From patchwork Tue Jan 7 08:55:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928459 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 532F21DF977; Tue, 7 Jan 2025 08:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239443; cv=none; b=LyVxJYwdQpjSKcaFJCzY899zq7J8fppkoYsCOpm+XKLsRe14h9toLlnS9GfSokaFtMHHQH+eK7a5JjnODV0mxD/8eh6hHbwDE+7qFJFz39dMuGxMfbXRHz+2yKsB4Wx6mjATOIaUVd8xB+qRD/Pmi45ByX4eOsY1avwl3TqbnGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239443; c=relaxed/simple; bh=5dkY9V6arO3B6Epj0oV1jzsP54PvWUIoWcGhJzf5Ygw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gG4rJ34QWqQVGYmoHwDoT/FwL+4CocHDwKiVoNxgPvLbCG91mOx1jwqBkzRfqfakxQmShN3VfwNzDP5ki4iZgrCeHS138HbJWaqr1C0SkJAJYCRbv3RgiR+M+rZWC+0pl7GQkZ8vC7kP1sRuBwA6q0Ts1YoTSNxHfjDYpWh0r+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YS4MX5Y1tz4f3jXP; Tue, 7 Jan 2025 16:43:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E0D881A15B6; Tue, 7 Jan 2025 16:43:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S10; Tue, 07 Jan 2025 16:43:56 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 6/7] bpf: Free element after unlock for pre-allocated htab Date: Tue, 7 Jan 2025 16:55:58 +0800 Message-Id: <20250107085559.3081563-7-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S10 X-Coremail-Antispam: 1UD129KBjvJXoW3JFW8Xw48Cw4kJr1xury3urg_yoW7AFWDpF WfWF17Kr4kCrsF9a1DtF1FgrW5Ars3WayUGFW8KryrKF15Wrnaqr48AF92gFy5Cr9xAF9a vrZFvwsxKws5ua7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6r W5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUI-eODUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... To fix the problem, the patch breaks the reuse and refill of per-cpu extra_elems into two independent part: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. After the break, it is safe to free pre-allocated element after bucket lock is unlocked. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 43 ++++++++++++++++--------------------------- 1 file changed, 16 insertions(+), 27 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 9211df2adda4..83c96c8941f0 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1034,9 +1034,16 @@ static struct htab_elem *alloc_preallocated_htab_elem(struct bpf_htab *htab, * use per-cpu extra elems to avoid freelist_pop/push */ pl_new = this_cpu_ptr(htab->extra_elems); - l_new = *pl_new; - *pl_new = old_elem; - return l_new; + /* Paired with cmpxchg_release() in free_htab_elem() */ + l_new = smp_load_acquire(pl_new); + /* extra_elems can be NULL if the current update operation + * preempts another update operation that hasn't yet refilled + * the per-cpu extra_elems. + */ + if (l_new) { + WRITE_ONCE(*pl_new, NULL); + return l_new; + } } l = __pcpu_freelist_pop(&htab->freelist); @@ -1139,7 +1146,6 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, struct htab_elem *l_new = NULL, *l_old; struct hlist_nulls_head *head; unsigned long flags; - void *old_map_ptr; struct bucket *b; u32 key_size, hash; int ret; @@ -1200,7 +1206,8 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, copy_map_value_locked(map, l_old->key + round_up(key_size, 8), value, false); - ret = 0; + /* don't free the reused old element */ + l_old = NULL; goto err; } @@ -1216,31 +1223,13 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, * concurrent search will find it before old elem */ hlist_nulls_add_head_rcu(&l_new->hash_node, head); - if (l_old) { + if (l_old) hlist_nulls_del_rcu(&l_old->hash_node); - - /* l_old has already been stashed in htab->extra_elems, free - * its special fields before it is available for reuse. Also - * save the old map pointer in htab of maps before unlock - * and release it after unlock. - */ - old_map_ptr = NULL; - if (htab_is_prealloc(htab)) { - if (map->ops->map_fd_put_ptr) - old_map_ptr = fd_htab_map_get_ptr(map, l_old); - check_and_free_fields(htab, l_old); - } - } - htab_unlock_bucket(htab, b, hash, flags); - if (l_old) { - if (old_map_ptr) - map->ops->map_fd_put_ptr(map, old_map_ptr, true); - if (!htab_is_prealloc(htab)) - free_htab_elem(htab, l_old, false); - } - return 0; err: htab_unlock_bucket(htab, b, hash, flags); + /* refill per-cpu extra_elems for preallocated htab */ + if (!ret && l_old) + free_htab_elem(htab, l_old, true); return ret; } From patchwork Tue Jan 7 08:55:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13928460 X-Patchwork-Delegate: bpf@iogearbox.net Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 533481E0E0A; Tue, 7 Jan 2025 08:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239444; cv=none; b=JX039rhTsgfsFA+TX+DBQWowbtDnS43iHyLAxLihXHyiCfULizdt6lhGcZuxOH4RQ5pg/YAoJIbOc06Lr0ZdDb8R/xWGW+6aerh7sPlM2iTJOq51icQASoj8sUvcJZgck2/H066fJMSZ6WpSBkC6jN6KE+XRk5IQCNszLiLVbsk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736239444; c=relaxed/simple; bh=FWV0BoBbS84j+XA3gTkPGwbiax5EzuoBU40jfx+C4dE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XJUoCrULvIunNhB2rvvZ1K+p4cJz6UA5ROXIO5G8BkXQNY6HMBPGtIZ2DbrdjOljQyM8v1A5hcR10tFaP4T8+XHCUtXdplt2kGz0IB9FSYdIoq8u6roOd8YSshpEgqX/43+DzBEbxNKIDywSTnsEPexUPcXbC4dY6fbuDug7xiU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YS4MY35VXz4f3jHy; Tue, 7 Jan 2025 16:43:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 8C0721A130E; Tue, 7 Jan 2025 16:43:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Gl9E6XxnpFgeAQ--.43336S11; Tue, 07 Jan 2025 16:43:57 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Song Liu , Hao Luo , Yonghong Song , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , Sebastian Andrzej Siewior , houtao1@huawei.com, xukuohai@huawei.com Subject: [PATCH bpf-next 7/7] selftests/bpf: Add test case for the freeing of bpf_timer Date: Tue, 7 Jan 2025 16:55:59 +0800 Message-Id: <20250107085559.3081563-8-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20250107085559.3081563-1-houtao@huaweicloud.com> References: <20250107085559.3081563-1-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Gl9E6XxnpFgeAQ--.43336S11 X-Coremail-Antispam: 1UD129KBjvJXoWxKr1kZF4xuF1ruF15XFW7CFg_yoWxuFW3pa yrK345Kr4rXw47Ww48tFn7GrWfKrs5XFyxGry0gw1UZr1Iqws5tF92gFy5tFW3CFWDWryS vF4FkFZ8GrZrJrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2 AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6r W5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14 v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuY vjxUI-eODUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-Patchwork-Delegate: bpf@iogearbox.net From: Hou Tao The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao --- .../selftests/bpf/prog_tests/free_timer.c | 165 ++++++++++++++++++ .../testing/selftests/bpf/progs/free_timer.c | 71 ++++++++ 2 files changed, 236 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c diff --git a/tools/testing/selftests/bpf/prog_tests/free_timer.c b/tools/testing/selftests/bpf/prog_tests/free_timer.c new file mode 100644 index 000000000000..b7b77a6b2979 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/free_timer.c @@ -0,0 +1,165 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#define _GNU_SOURCE +#include +#include +#include + +#include "free_timer.skel.h" + +struct run_ctx { + struct bpf_program *start_prog; + struct bpf_program *overwrite_prog; + pthread_barrier_t notify; + int loop; + bool start; + bool stop; +}; + +static void start_threads(struct run_ctx *ctx) +{ + ctx->start = true; +} + +static void stop_threads(struct run_ctx *ctx) +{ + ctx->stop = true; + /* Guarantee the order between ->stop and ->start */ + __atomic_store_n(&ctx->start, true, __ATOMIC_RELEASE); +} + +static int wait_for_start(struct run_ctx *ctx) +{ + while (!__atomic_load_n(&ctx->start, __ATOMIC_ACQUIRE)) + usleep(10); + + return ctx->stop; +} + +static void *overwrite_timer_fn(void *arg) +{ + struct run_ctx *ctx = arg; + int loop, fd, err; + cpu_set_t cpuset; + long ret = 0; + + /* Pin on CPU 0 */ + CPU_ZERO(&cpuset); + CPU_SET(0, &cpuset); + pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset); + + /* Is the thread being stopped ? */ + err = wait_for_start(ctx); + if (err) + return NULL; + + fd = bpf_program__fd(ctx->overwrite_prog); + loop = ctx->loop; + while (loop-- > 0) { + LIBBPF_OPTS(bpf_test_run_opts, opts); + + /* Wait for start thread to complete */ + pthread_barrier_wait(&ctx->notify); + + /* Overwrite timers */ + err = bpf_prog_test_run_opts(fd, &opts); + if (err) + ret |= 1; + else if (opts.retval) + ret |= 2; + + /* Notify start thread to start timers */ + pthread_barrier_wait(&ctx->notify); + } + + return (void *)ret; +} + +static void *start_timer_fn(void *arg) +{ + struct run_ctx *ctx = arg; + int loop, fd, err; + cpu_set_t cpuset; + long ret = 0; + + /* Pin on CPU 1 */ + CPU_ZERO(&cpuset); + CPU_SET(1, &cpuset); + pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset); + + /* Is the thread being stopped ? */ + err = wait_for_start(ctx); + if (err) + return NULL; + + fd = bpf_program__fd(ctx->start_prog); + loop = ctx->loop; + while (loop-- > 0) { + LIBBPF_OPTS(bpf_test_run_opts, opts); + + /* Run the prog to start timer */ + err = bpf_prog_test_run_opts(fd, &opts); + if (err) + ret |= 4; + else if (opts.retval) + ret |= 8; + + /* Notify overwrite thread to do overwrite */ + pthread_barrier_wait(&ctx->notify); + + /* Wait for overwrite thread to complete */ + pthread_barrier_wait(&ctx->notify); + } + + return (void *)ret; +} + +void test_free_timer(void) +{ + struct free_timer *skel; + struct bpf_program *prog; + struct run_ctx ctx; + pthread_t tid[2]; + void *ret; + int err; + + skel = free_timer__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_load")) + return; + + memset(&ctx, 0, sizeof(ctx)); + + prog = bpf_object__find_program_by_name(skel->obj, "start_timer"); + if (!ASSERT_OK_PTR(prog, "find start prog")) + goto out; + ctx.start_prog = prog; + + prog = bpf_object__find_program_by_name(skel->obj, "overwrite_timer"); + if (!ASSERT_OK_PTR(prog, "find overwrite prog")) + goto out; + ctx.overwrite_prog = prog; + + pthread_barrier_init(&ctx.notify, NULL, 2); + ctx.loop = 10; + + err = pthread_create(&tid[0], NULL, start_timer_fn, &ctx); + if (!ASSERT_OK(err, "create start_timer")) + goto out; + + err = pthread_create(&tid[1], NULL, overwrite_timer_fn, &ctx); + if (!ASSERT_OK(err, "create overwrite_timer")) { + stop_threads(&ctx); + goto out; + } + + start_threads(&ctx); + + ret = NULL; + err = pthread_join(tid[0], &ret); + ASSERT_EQ(err | (long)ret, 0, "start_timer"); + ret = NULL; + err = pthread_join(tid[1], &ret); + ASSERT_EQ(err | (long)ret, 0, "overwrite_timer"); +out: + free_timer__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/free_timer.c b/tools/testing/selftests/bpf/progs/free_timer.c new file mode 100644 index 000000000000..4501ae8fc414 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/free_timer.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include + +#define MAX_ENTRIES 8 + +struct map_value { + struct bpf_timer timer; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, int); + __type(value, struct map_value); + __uint(max_entries, MAX_ENTRIES); +} map SEC(".maps"); + +static int timer_cb(void *map, void *key, struct map_value *value) +{ + volatile int sum = 0; + int i; + + bpf_for(i, 0, 1024 * 1024) sum += i; + + return 0; +} + +static int start_cb(int key) +{ + struct map_value *value; + + value = bpf_map_lookup_elem(&map, (void *)&key); + if (!value) + return 0; + + bpf_timer_init(&value->timer, &map, CLOCK_MONOTONIC); + bpf_timer_set_callback(&value->timer, timer_cb); + /* Hope 100us will be enough to wake-up and run the overwrite thread */ + bpf_timer_start(&value->timer, 100000, BPF_F_TIMER_CPU_PIN); + + return 0; +} + +static int overwrite_cb(int key) +{ + struct map_value zero = {}; + + /* Free the timer which may run on other CPU */ + bpf_map_update_elem(&map, (void *)&key, &zero, BPF_ANY); + + return 0; +} + +SEC("syscall") +int BPF_PROG(start_timer) +{ + bpf_loop(MAX_ENTRIES, start_cb, NULL, 0); + return 0; +} + +SEC("syscall") +int BPF_PROG(overwrite_timer) +{ + bpf_loop(MAX_ENTRIES, overwrite_cb, NULL, 0); + return 0; +} + +char _license[] SEC("license") = "GPL";