From patchwork Thu Nov 18 01:04:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12625813 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 902A1C433F5 for ; Thu, 18 Nov 2021 01:06:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 737D161B42 for ; Thu, 18 Nov 2021 01:06:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242077AbhKRBJt (ORCPT ); Wed, 17 Nov 2021 20:09:49 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:38550 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S237831AbhKRBJt (ORCPT ); Wed, 17 Nov 2021 20:09:49 -0500 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.1.2/8.16.1.2) with SMTP id 1AHLe7of032230 for ; Wed, 17 Nov 2021 17:06:50 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Jj7Z9i/YGp87pxZCtpExCD5y3sA6b0hiIOlmrxfniHA=; b=ARdXxxrRc9KlYMC+QPpy31OObZqRqJ9W58Wjwexl5jT8E8Co1/CqlAEHR6QjUEKsD/AS Jp6lri2A3G15/3IFcb7mR4gE9HfADZrh6cBY9AHiMTMUx1UwRXOWVTtWWDk7cXjrwS/D sGYcRtlZTUCia5wHLLuzPn4STNnyd0uPIFE= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 3cd3ahvpvq-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 17 Nov 2021 17:06:49 -0800 Received: from intmgw002.25.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Wed, 17 Nov 2021 17:06:48 -0800 Received: by devbig612.frc2.facebook.com (Postfix, from userid 115148) id 9A0BA4F5FAD4; Wed, 17 Nov 2021 17:06:47 -0800 (PST) From: Joanne Koong To: CC: , , , , , Joanne Koong Subject: [PATCH bpf-next 1/3] bpf: Add bpf_for_each helper Date: Wed, 17 Nov 2021 17:04:02 -0800 Message-ID: <20211118010404.2415864-2-joannekoong@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211118010404.2415864-1-joannekoong@fb.com> References: <20211118010404.2415864-1-joannekoong@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: USt1ilUgIXwcl6uZqTAH7zI9loMgnbwm X-Proofpoint-ORIG-GUID: USt1ilUgIXwcl6uZqTAH7zI9loMgnbwm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-17_09,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 phishscore=0 lowpriorityscore=0 clxscore=1015 malwarescore=0 mlxlogscore=999 priorityscore=1501 impostorscore=0 spamscore=0 bulkscore=0 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111180004 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds the kernel-side and API changes for a new helper function, bpf_for_each: long bpf_for_each(u32 nr_interations, void *callback_fn, void *callback_ctx, u64 flags); bpf_for_each invokes the "callback_fn" nr_iterations number of times or until the callback_fn returns 1. A few things to please note: ~ The "u64 flags" parameter is currently unused but is included in case a future use case for it arises. ~ In the kernel-side implementation of bpf_for_each (kernel/bpf/bpf_iter.c), bpf_callback_t is used as the callback function cast. ~ A program can have nested bpf_for_each calls but the program must still adhere to the verifier constraint of its stack depth (the stack depth cannot exceed MAX_BPF_STACK)) ~ The next patch will include the tests and benchmark Signed-off-by: Joanne Koong Acked-by: Toke Høiland-Jørgensen --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 23 +++++++++++++++++++++++ kernel/bpf/bpf_iter.c | 32 ++++++++++++++++++++++++++++++++ kernel/bpf/helpers.c | 2 ++ kernel/bpf/verifier.c | 28 ++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 23 +++++++++++++++++++++++ 6 files changed, 109 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6deebf8bf78f..d9b69a896c91 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2107,6 +2107,7 @@ extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; extern const struct bpf_func_proto bpf_task_storage_get_proto; extern const struct bpf_func_proto bpf_task_storage_delete_proto; extern const struct bpf_func_proto bpf_for_each_map_elem_proto; +extern const struct bpf_func_proto bpf_for_each_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; extern const struct bpf_func_proto bpf_sk_getsockopt_proto; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index bd0c9f0487f6..ea5098920ed2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4750,6 +4750,28 @@ union bpf_attr { * The number of traversed map elements for success, **-EINVAL** for * invalid **flags**. * + * long bpf_for_each(u32 nr_iterations, void *callback_fn, void *callback_ctx, u64 flags) + * Description + * For **nr_iterations**, call **callback_fn** function with + * **callback_ctx** as the context parameter. + * The **callback_fn** should be a static function and + * the **callback_ctx** should be a pointer to the stack. + * The **flags** is used to control certain aspects of the helper. + * Currently, the **flags** must be 0. + * + * long (\*callback_fn)(u32 index, void \*ctx); + * + * where **index** is the current index in the iteration. The index + * is zero-indexed. + * + * If **callback_fn** returns 0, the helper will continue to the next + * iteration. If return value is 1, the helper will skip the rest of + * the iterations and return. Other return values are not used now. + * + * Return + * The number of iterations performed, **-EINVAL** for invalid **flags** + * or a null **callback_fn**. + * * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) * Description * Outputs a string into the **str** buffer of size **str_size** @@ -5105,6 +5127,7 @@ union bpf_attr { FN(sock_from_file), \ FN(check_mtu), \ FN(for_each_map_elem), \ + FN(for_each), \ FN(snprintf), \ FN(sys_bpf), \ FN(btf_find_by_name_kind), \ diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index b2ee45064e06..cb742c50898a 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -714,3 +714,35 @@ const struct bpf_func_proto bpf_for_each_map_elem_proto = { .arg3_type = ARG_PTR_TO_STACK_OR_NULL, .arg4_type = ARG_ANYTHING, }; + +BPF_CALL_4(bpf_for_each, u32, nr_iterations, void *, callback_fn, void *, callback_ctx, + u64, flags) +{ + bpf_callback_t callback = (bpf_callback_t)callback_fn; + u64 err; + u32 i; + + if (flags) + return -EINVAL; + + for (i = 0; i < nr_iterations; i++) { + err = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0); + /* return value: 0 - continue, 1 - stop and return */ + if (err) { + i++; + break; + } + } + + return i; +} + +const struct bpf_func_proto bpf_for_each_proto = { + .func = bpf_for_each, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_PTR_TO_FUNC, + .arg3_type = ARG_PTR_TO_STACK_OR_NULL, + .arg4_type = ARG_ANYTHING, +}; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1ffd469c217f..ac10e1b3859f 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1378,6 +1378,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_ringbuf_query_proto; case BPF_FUNC_for_each_map_elem: return &bpf_for_each_map_elem_proto; + case BPF_FUNC_for_each: + return &bpf_for_each_proto; default: break; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3c8aa7df1773..7c9cc4f01104 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6103,6 +6103,27 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env, return 0; } +static int set_for_each_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, + int insn_idx) +{ + /* bpf_for_each(u32 nr_iterations, void *callback_fn, void *callback_ctx, + * u64 flags); + * callback_fn(u32 index, void *callback_ctx); + */ + callee->regs[BPF_REG_1].type = SCALAR_VALUE; + callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3]; + + /* unused */ + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); + + callee->in_callback_fn = true; + return 0; +} + static int set_timer_callback_state(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee, @@ -6482,6 +6503,13 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EINVAL; } + if (func_id == BPF_FUNC_for_each) { + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, + set_for_each_callback_state); + if (err < 0) + return -EINVAL; + } + if (func_id == BPF_FUNC_timer_set_callback) { err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, set_timer_callback_state); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index bd0c9f0487f6..ea5098920ed2 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4750,6 +4750,28 @@ union bpf_attr { * The number of traversed map elements for success, **-EINVAL** for * invalid **flags**. * + * long bpf_for_each(u32 nr_iterations, void *callback_fn, void *callback_ctx, u64 flags) + * Description + * For **nr_iterations**, call **callback_fn** function with + * **callback_ctx** as the context parameter. + * The **callback_fn** should be a static function and + * the **callback_ctx** should be a pointer to the stack. + * The **flags** is used to control certain aspects of the helper. + * Currently, the **flags** must be 0. + * + * long (\*callback_fn)(u32 index, void \*ctx); + * + * where **index** is the current index in the iteration. The index + * is zero-indexed. + * + * If **callback_fn** returns 0, the helper will continue to the next + * iteration. If return value is 1, the helper will skip the rest of + * the iterations and return. Other return values are not used now. + * + * Return + * The number of iterations performed, **-EINVAL** for invalid **flags** + * or a null **callback_fn**. + * * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) * Description * Outputs a string into the **str** buffer of size **str_size** @@ -5105,6 +5127,7 @@ union bpf_attr { FN(sock_from_file), \ FN(check_mtu), \ FN(for_each_map_elem), \ + FN(for_each), \ FN(snprintf), \ FN(sys_bpf), \ FN(btf_find_by_name_kind), \ From patchwork Thu Nov 18 01:04:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12625815 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D3C4C433EF for ; Thu, 18 Nov 2021 01:07:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E130C61B42 for ; Thu, 18 Nov 2021 01:07:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242099AbhKRBKA (ORCPT ); Wed, 17 Nov 2021 20:10:00 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:35180 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237831AbhKRBJ7 (ORCPT ); Wed, 17 Nov 2021 20:09:59 -0500 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AHNVTHu002789 for ; Wed, 17 Nov 2021 17:07:00 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=AF+Ug2lLkaBLgum97kG2+0QabH9CQmlJebQcILOpims=; b=bMjS7v6OFbSaGsnS8H3MJdXNZRMgwhE9SUBVjHF+22PPP00V9MQksCA4tKY8uSc+9GPY tmrVyKddAAkKv25ij6q1NEqdJyqTRbNsnEIQBMYTMY3n3Mo6QudOisNQaUaSClJ5UzEx Zdb3dnxV4khRpn9gfNX96uz7pt2JeYqnqzM= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 3cdbfvrf7a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 17 Nov 2021 17:06:59 -0800 Received: from intmgw001.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Wed, 17 Nov 2021 17:06:58 -0800 Received: by devbig612.frc2.facebook.com (Postfix, from userid 115148) id BA9E34F5FAD8; Wed, 17 Nov 2021 17:06:50 -0800 (PST) From: Joanne Koong To: CC: , , , , , Joanne Koong Subject: [PATCH bpf-next 2/3] selftests/bpf: Add tests for bpf_for_each Date: Wed, 17 Nov 2021 17:04:03 -0800 Message-ID: <20211118010404.2415864-3-joannekoong@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211118010404.2415864-1-joannekoong@fb.com> References: <20211118010404.2415864-1-joannekoong@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-ORIG-GUID: GL2PPd4e9uUaS-QfTOdC5TNd4CFDUO0M X-Proofpoint-GUID: GL2PPd4e9uUaS-QfTOdC5TNd4CFDUO0M X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-17_09,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 clxscore=1015 suspectscore=0 phishscore=0 spamscore=0 mlxscore=0 mlxlogscore=549 lowpriorityscore=0 impostorscore=0 bulkscore=0 malwarescore=0 priorityscore=1501 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111180005 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net In this patch - 1) Add a new prog "for_each_helper" which tests the basic functionality of the bpf_for_each helper. 2) Add pyperf600_foreach and strobemeta_foreach to test the performance of using bpf_for_each instead of a for loop The results of pyperf600 and strobemeta are as follows: ~strobemeta~ Baseline verification time 6808200 usec stack depth 496 processed 592132 insns (limit 1000000) max_states_per_insn 14 total_states 16018 peak_states 13684 mark_read 3132 #188 verif_scale_strobemeta:OK (unrolled loop) Using bpf_for_each verification time 31589 usec stack depth 96+408 processed 1630 insns (limit 1000000) max_states_per_insn 4 total_states 107 peak_states 107 mark_read 60 #189 verif_scale_strobemeta_foreach:OK ~pyperf600~ Baseline verification time 29702486 usec stack depth 368 processed 626838 insns (limit 1000000) max_states_per_insn 7 total_states 30368 peak_states 30279 mark_read 748 #182 verif_scale_pyperf600:OK (unrolled loop) Using bpf_for_each verification time 148488 usec stack depth 320+40 processed 10518 insns (limit 1000000) max_states_per_insn 10 total_states 705 peak_states 517 mark_read 38 #183 verif_scale_pyperf600_foreach:OK Using the bpf_for_each helper led to approximately a 100% decrease in the verification time and in the number of instructions. Signed-off-by: Joanne Koong --- .../bpf/prog_tests/bpf_verif_scale.c | 12 +++ .../selftests/bpf/prog_tests/for_each.c | 61 ++++++++++++++++ .../selftests/bpf/progs/for_each_helper.c | 69 ++++++++++++++++++ tools/testing/selftests/bpf/progs/pyperf.h | 70 +++++++++++++++++- .../selftests/bpf/progs/pyperf600_foreach.c | 5 ++ .../testing/selftests/bpf/progs/strobemeta.h | 73 ++++++++++++++++++- .../selftests/bpf/progs/strobemeta_foreach.c | 9 +++ 7 files changed, 295 insertions(+), 4 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/for_each_helper.c create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_foreach.c create mode 100644 tools/testing/selftests/bpf/progs/strobemeta_foreach.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c index 867349e4ed9e..77396484fde7 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c @@ -115,6 +115,12 @@ void test_verif_scale_pyperf600() scale_test("pyperf600.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); } +void test_verif_scale_pyperf600_foreach(void) +{ + /* use the bpf_for_each helper*/ + scale_test("pyperf600_foreach.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); +} + void test_verif_scale_pyperf600_nounroll() { /* no unroll at all. @@ -165,6 +171,12 @@ void test_verif_scale_strobemeta() scale_test("strobemeta.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); } +void test_verif_scale_strobemeta_foreach(void) +{ + /* use the bpf_for_each helper*/ + scale_test("strobemeta_foreach.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); +} + void test_verif_scale_strobemeta_nounroll1() { /* no unroll, tiny loops */ diff --git a/tools/testing/selftests/bpf/prog_tests/for_each.c b/tools/testing/selftests/bpf/prog_tests/for_each.c index 68eb12a287d4..529573a82334 100644 --- a/tools/testing/selftests/bpf/prog_tests/for_each.c +++ b/tools/testing/selftests/bpf/prog_tests/for_each.c @@ -4,6 +4,7 @@ #include #include "for_each_hash_map_elem.skel.h" #include "for_each_array_map_elem.skel.h" +#include "for_each_helper.skel.h" static unsigned int duration; @@ -121,10 +122,70 @@ static void test_array_map(void) for_each_array_map_elem__destroy(skel); } +static void test_for_each_helper(void) +{ + struct for_each_helper *skel; + __u32 retval; + int err; + + skel = for_each_helper__open_and_load(); + if (!ASSERT_OK_PTR(skel, "for_each_helper__open_and_load")) + return; + + skel->bss->nr_iterations = 100; + err = bpf_prog_test_run(bpf_program__fd(skel->progs.test_prog), + 1, &pkt_v4, sizeof(pkt_v4), NULL, NULL, + &retval, &duration); + if (CHECK(err || retval, "bpf_for_each helper test_prog", + "err %d errno %d retval %d\n", err, errno, retval)) + goto out; + ASSERT_EQ(skel->bss->nr_iterations_completed, skel->bss->nr_iterations, + "nr_iterations mismatch"); + ASSERT_EQ(skel->bss->g_output, (100 * 99) / 2, "wrong output"); + + /* test callback_fn returning 1 to stop iteration */ + skel->bss->nr_iterations = 400; + skel->data->stop_index = 50; + err = bpf_prog_test_run(bpf_program__fd(skel->progs.test_prog), + 1, &pkt_v4, sizeof(pkt_v4), NULL, NULL, + &retval, &duration); + if (CHECK(err || retval, "bpf_for_each helper test_prog", + "err %d errno %d retval %d\n", err, errno, retval)) + goto out; + ASSERT_EQ(skel->bss->nr_iterations_completed, skel->data->stop_index + 1, + "stop_index not followed"); + ASSERT_EQ(skel->bss->g_output, (50 * 49) / 2, "wrong output"); + + /* test passing in a null ctx */ + skel->bss->nr_iterations = 10; + err = bpf_prog_test_run(bpf_program__fd(skel->progs.prog_null_ctx), + 1, &pkt_v4, sizeof(pkt_v4), NULL, NULL, + &retval, &duration); + if (CHECK(err || retval, "bpf_for_each helper prog_null_ctx", + "err %d errno %d retval %d\n", err, errno, retval)) + goto out; + ASSERT_EQ(skel->bss->nr_iterations_completed, skel->bss->nr_iterations, + "nr_iterations mismatch"); + + /* test invalid flags */ + err = bpf_prog_test_run(bpf_program__fd(skel->progs.prog_invalid_flags), + 1, &pkt_v4, sizeof(pkt_v4), NULL, NULL, + &retval, &duration); + if (CHECK(err || retval, "bpf_for_each helper prog_invalid_flags", + "err %d errno %d retval %d\n", err, errno, retval)) + goto out; + ASSERT_EQ(skel->bss->err, -EINVAL, "invalid_flags"); + +out: + for_each_helper__destroy(skel); +} + void test_for_each(void) { if (test__start_subtest("hash_map")) test_hash_map(); if (test__start_subtest("array_map")) test_array_map(); + if (test__start_subtest("for_each_helper")) + test_for_each_helper(); } diff --git a/tools/testing/selftests/bpf/progs/for_each_helper.c b/tools/testing/selftests/bpf/progs/for_each_helper.c new file mode 100644 index 000000000000..4404d0cb32a6 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/for_each_helper.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ + +#include "vmlinux.h" +#include + +char _license[] SEC("license") = "GPL"; + +struct callback_ctx { + int output; +}; + +/* This should be set by the user program */ +u32 nr_iterations; +u32 stop_index = -1; + +/* Making these global variables so that the userspace program + * can verify the output through the skeleton + */ +int nr_iterations_completed; +int g_output; +int err; + +static int callback_fn(__u32 index, void *data) +{ + struct callback_ctx *ctx = data; + + if (index >= stop_index) + return 1; + + ctx->output += index; + + return 0; +} + +static int empty_callback_fn(__u32 index, void *data) +{ + return 0; +} + +SEC("tc") +int test_prog(struct __sk_buff *skb) +{ + struct callback_ctx data = {}; + + nr_iterations_completed = bpf_for_each(nr_iterations, callback_fn, &data, 0); + + g_output = data.output; + + return 0; +} + +SEC("tc") +int prog_null_ctx(struct __sk_buff *skb) +{ + nr_iterations_completed = bpf_for_each(nr_iterations, empty_callback_fn, NULL, 0); + + return 0; +} + +SEC("tc") +int prog_invalid_flags(struct __sk_buff *skb) +{ + struct callback_ctx data = {}; + + err = bpf_for_each(nr_iterations, callback_fn, &data, 1); + + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/pyperf.h b/tools/testing/selftests/bpf/progs/pyperf.h index 2fb7adafb6b6..186735e2f385 100644 --- a/tools/testing/selftests/bpf/progs/pyperf.h +++ b/tools/testing/selftests/bpf/progs/pyperf.h @@ -159,6 +159,57 @@ struct { __uint(value_size, sizeof(long long) * 127); } stackmap SEC(".maps"); +struct process_frame_ctx { + int cur_cpu; + int32_t *symbol_counter; + void *frame_ptr; + FrameData *frame; + PidData *pidData; + Symbol *sym; + Event *event; + bool done; +}; + +#define barrier_var(var) asm volatile("" : "=r"(var) : "0"(var)) + +static int process_frame_callback(__u32 i, struct process_frame_ctx *ctx) +{ + int zero = 0; + void *frame_ptr = ctx->frame_ptr; + PidData *pidData = ctx->pidData; + FrameData *frame = ctx->frame; + int32_t *symbol_counter = ctx->symbol_counter; + int cur_cpu = ctx->cur_cpu; + Event *event = ctx->event; + Symbol *sym = ctx->sym; + + if (frame_ptr && get_frame_data(frame_ptr, pidData, frame, sym)) { + int32_t new_symbol_id = *symbol_counter * 64 + cur_cpu; + int32_t *symbol_id = bpf_map_lookup_elem(&symbolmap, sym); + + if (!symbol_id) { + bpf_map_update_elem(&symbolmap, sym, &zero, 0); + symbol_id = bpf_map_lookup_elem(&symbolmap, sym); + if (!symbol_id) { + ctx->done = true; + return 1; + } + } + if (*symbol_id == new_symbol_id) + (*symbol_counter)++; + + barrier_var(i); + if (i >= STACK_MAX_LEN) + return 1; + + event->stack[i] = *symbol_id; + + event->stack_len = i + 1; + frame_ptr = frame->f_back; + } + return 0; +} + #ifdef GLOBAL_FUNC __noinline #elif defined(SUBPROGS) @@ -228,11 +279,27 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx) int32_t* symbol_counter = bpf_map_lookup_elem(&symbolmap, &sym); if (symbol_counter == NULL) return 0; +#ifdef USE_FOREACH + struct process_frame_ctx ctx; + + ctx.cur_cpu = cur_cpu; + ctx.symbol_counter = symbol_counter; + ctx.frame_ptr = frame_ptr; + ctx.frame = &frame; + ctx.pidData = pidData; + ctx.sym = &sym; + ctx.event = event; + ctx.done = false; + + bpf_for_each(STACK_MAX_LEN, process_frame_callback, &ctx, 0); + if (ctx.done) + return 0; +#else #ifdef NO_UNROLL #pragma clang loop unroll(disable) #else #pragma clang loop unroll(full) -#endif +#endif /* NO_UNROLL */ /* Unwind python stack */ for (int i = 0; i < STACK_MAX_LEN; ++i) { if (frame_ptr && get_frame_data(frame_ptr, pidData, &frame, &sym)) { @@ -251,6 +318,7 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx) frame_ptr = frame.f_back; } } +#endif /* USE_FOREACH */ event->stack_complete = frame_ptr == NULL; } else { event->stack_complete = 1; diff --git a/tools/testing/selftests/bpf/progs/pyperf600_foreach.c b/tools/testing/selftests/bpf/progs/pyperf600_foreach.c new file mode 100644 index 000000000000..1b43f12a7d54 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/pyperf600_foreach.c @@ -0,0 +1,5 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2021 Facebook +#define STACK_MAX_LEN 600 +#define USE_FOREACH +#include "pyperf.h" diff --git a/tools/testing/selftests/bpf/progs/strobemeta.h b/tools/testing/selftests/bpf/progs/strobemeta.h index 7de534f38c3f..40069e797930 100644 --- a/tools/testing/selftests/bpf/progs/strobemeta.h +++ b/tools/testing/selftests/bpf/progs/strobemeta.h @@ -445,6 +445,46 @@ static __always_inline void *read_map_var(struct strobemeta_cfg *cfg, return payload; } +enum read_type { + READ_INT_VAR, + READ_MAP_VAR, + READ_STR_VAR, +}; + +struct read_var_ctx { + struct strobemeta_payload *data; + void *tls_base; + struct strobemeta_cfg *cfg; + void *payload; + /* value gets mutated */ + struct strobe_value_generic *value; + enum read_type type; +}; + +static int read_var_callback(__u32 index, struct read_var_ctx *ctx) +{ + switch (ctx->type) { + case READ_INT_VAR: + if (index >= STROBE_MAX_INTS) + return 1; + read_int_var(ctx->cfg, index, ctx->tls_base, ctx->value, ctx->data); + break; + case READ_MAP_VAR: + if (index >= STROBE_MAX_MAPS) + return 1; + ctx->payload = read_map_var(ctx->cfg, index, ctx->tls_base, + ctx->value, ctx->data, ctx->payload); + break; + case READ_STR_VAR: + if (index >= STROBE_MAX_STRS) + return 1; + ctx->payload += read_str_var(ctx->cfg, index, ctx->tls_base, + ctx->value, ctx->data, ctx->payload); + break; + } + return 0; +} + /* * read_strobe_meta returns NULL, if no metadata was read; otherwise returns * pointer to *right after* payload ends @@ -475,11 +515,36 @@ static void *read_strobe_meta(struct task_struct *task, */ tls_base = (void *)task; +#ifdef USE_FOREACH + struct read_var_ctx ctx; + int err; + + ctx.cfg = cfg; + ctx.tls_base = tls_base; + ctx.value = &value; + ctx.data = data; + ctx.payload = payload; + ctx.type = READ_INT_VAR; + + err = bpf_for_each(STROBE_MAX_INTS, read_var_callback, &ctx, 0); + if (err != STROBE_MAX_INTS) + return NULL; + + ctx.type = READ_STR_VAR; + err = bpf_for_each(STROBE_MAX_STRS, read_var_callback, &ctx, 0); + if (err != STROBE_MAX_STRS) + return NULL; + + ctx.type = READ_MAP_VAR; + err = bpf_for_each(STROBE_MAX_MAPS, read_var_callback, &ctx, 0); + if (err != STROBE_MAX_MAPS) + return NULL; +#else #ifdef NO_UNROLL #pragma clang loop unroll(disable) #else #pragma unroll -#endif +#endif /* NO_UNROLL */ for (int i = 0; i < STROBE_MAX_INTS; ++i) { read_int_var(cfg, i, tls_base, &value, data); } @@ -487,7 +552,7 @@ static void *read_strobe_meta(struct task_struct *task, #pragma clang loop unroll(disable) #else #pragma unroll -#endif +#endif /* NO_UNROLL */ for (int i = 0; i < STROBE_MAX_STRS; ++i) { payload += read_str_var(cfg, i, tls_base, &value, data, payload); } @@ -495,10 +560,12 @@ static void *read_strobe_meta(struct task_struct *task, #pragma clang loop unroll(disable) #else #pragma unroll -#endif +#endif /* NO_UNROLL */ for (int i = 0; i < STROBE_MAX_MAPS; ++i) { payload = read_map_var(cfg, i, tls_base, &value, data, payload); } +#endif /* USE_FOREACH */ + /* * return pointer right after end of payload, so it's possible to * calculate exact amount of useful data that needs to be sent diff --git a/tools/testing/selftests/bpf/progs/strobemeta_foreach.c b/tools/testing/selftests/bpf/progs/strobemeta_foreach.c new file mode 100644 index 000000000000..a2d57d2d68ad --- /dev/null +++ b/tools/testing/selftests/bpf/progs/strobemeta_foreach.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +// Copyright (c) 2021 Facebook + +#define STROBE_MAX_INTS 2 +#define STROBE_MAX_STRS 25 +#define STROBE_MAX_MAPS 100 +#define STROBE_MAX_MAP_ENTRIES 20 +#define USE_FOREACH +#include "strobemeta.h" From patchwork Thu Nov 18 01:04:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12625817 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C38C433FE for ; Thu, 18 Nov 2021 01:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0CA0E61B3F for ; Thu, 18 Nov 2021 01:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242100AbhKRBKA (ORCPT ); Wed, 17 Nov 2021 20:10:00 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:41902 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237831AbhKRBKA (ORCPT ); Wed, 17 Nov 2021 20:10:00 -0500 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AHLeCJq006841 for ; Wed, 17 Nov 2021 17:07:01 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=facebook; bh=PMnMl7JHz/z5Rj3SxfUQ4Ex8WNuSbKnTnshpFFtwRNc=; b=B+QQD/iH9OzAFcyXVI5ghWWU98HjN0ieKc6vy7PPMhijafHWRTvGdt5agXW8R0Sm/wqm 3GsExoBbafUL6BtMKcbCOfiV2nududcpqoghiwEbKvv4IdCbU7009bRLeACdwlYiqNvq /auhpnXJoMq4BEOIdiv9AgGjGvP/TJ+yizY= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 3ccyjw5ykm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 17 Nov 2021 17:07:01 -0800 Received: from intmgw002.25.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Wed, 17 Nov 2021 17:07:00 -0800 Received: by devbig612.frc2.facebook.com (Postfix, from userid 115148) id 13CE04F5FAE5; Wed, 17 Nov 2021 17:06:53 -0800 (PST) From: Joanne Koong To: CC: , , , , , Joanne Koong Subject: [PATCH bpf-next 3/3] selftest/bpf/benchs: add bpf_for_each benchmark Date: Wed, 17 Nov 2021 17:04:04 -0800 Message-ID: <20211118010404.2415864-4-joannekoong@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211118010404.2415864-1-joannekoong@fb.com> References: <20211118010404.2415864-1-joannekoong@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-ORIG-GUID: TTLYnmTFuDf1ZWmf0jJNv_-26oL16gT8 X-Proofpoint-GUID: TTLYnmTFuDf1ZWmf0jJNv_-26oL16gT8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-17_09,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 priorityscore=1501 impostorscore=0 suspectscore=0 phishscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 mlxlogscore=999 mlxscore=0 clxscore=1015 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111180005 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add benchmark to measure the overhead of the bpf_for_each call for a specified number of iterations. Testing this on qemu on my dev machine on 1 thread, the data is as follows: nr_iterations: 1 bpf_for_each helper - total callbacks called: 42.949 ± 1.404M/s nr_iterations: 10 bpf_for_each helper - total callbacks called: 73.645 ± 2.077M/s nr_iterations: 100 bpf_for_each helper - total callbacks called: 73.058 ± 1.256M/s nr_iterations: 500 bpf_for_each helper - total callbacks called: 78.255 ± 2.845M/s nr_iterations: 1000 bpf_for_each helper - total callbacks called: 79.439 ± 1.805M/s nr_iterations: 5000 bpf_for_each helper - total callbacks called: 81.639 ± 2.053M/s nr_iterations: 10000 bpf_for_each helper - total callbacks called: 80.577 ± 1.824M/s nr_iterations: 50000 bpf_for_each helper - total callbacks called: 76.773 ± 1.578M/s nr_iterations: 100000 bpf_for_each helper - total callbacks called: 77.073 ± 2.200M/s nr_iterations: 500000 bpf_for_each helper - total callbacks called: 75.136 ± 0.552M/s nr_iterations: 1000000 bpf_for_each helper - total callbacks called: 76.364 ± 1.690M/s From this data, we can see that we are able to run the loop at least 40 million times per second on an empty callback function. From this data, we can also see that as the number of iterations increases, the overhead per iteration decreases and steadies towards a constant value. Signed-off-by: Joanne Koong --- tools/testing/selftests/bpf/Makefile | 3 +- tools/testing/selftests/bpf/bench.c | 4 + .../selftests/bpf/benchs/bench_for_each.c | 105 ++++++++++++++++++ .../bpf/benchs/run_bench_for_each.sh | 16 +++ .../selftests/bpf/progs/for_each_helper.c | 13 +++ 5 files changed, 140 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_for_each.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_for_each.sh diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index f49cb5fc85af..b55fc72b8ef0 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -537,7 +537,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \ $(OUTPUT)/bench_rename.o \ $(OUTPUT)/bench_trigger.o \ $(OUTPUT)/bench_ringbufs.o \ - $(OUTPUT)/bench_bloom_filter_map.o + $(OUTPUT)/bench_bloom_filter_map.o \ + $(OUTPUT)/bench_for_each.o $(call msg,BINARY,,$@) $(Q)$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS) diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index cc4722f693e9..d8b3d537a700 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -171,10 +171,12 @@ static const struct argp_option opts[] = { extern struct argp bench_ringbufs_argp; extern struct argp bench_bloom_map_argp; +extern struct argp bench_for_each_argp; static const struct argp_child bench_parsers[] = { { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, { &bench_bloom_map_argp, 0, "Bloom filter map benchmark", 0 }, + { &bench_for_each_argp, 0, "bpf_for_each helper benchmark", 0 }, {}, }; @@ -368,6 +370,7 @@ extern const struct bench bench_bloom_update; extern const struct bench bench_bloom_false_positive; extern const struct bench bench_hashmap_without_bloom; extern const struct bench bench_hashmap_with_bloom; +extern const struct bench bench_for_each_helper; static const struct bench *benchs[] = { &bench_count_global, @@ -394,6 +397,7 @@ static const struct bench *benchs[] = { &bench_bloom_false_positive, &bench_hashmap_without_bloom, &bench_hashmap_with_bloom, + &bench_for_each_helper, }; static void setup_benchmark() diff --git a/tools/testing/selftests/bpf/benchs/bench_for_each.c b/tools/testing/selftests/bpf/benchs/bench_for_each.c new file mode 100644 index 000000000000..3372d5b7d67b --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/bench_for_each.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ + +#include +#include "bench.h" +#include "for_each_helper.skel.h" + +/* BPF triggering benchmarks */ +static struct ctx { + struct for_each_helper *skel; +} ctx; + +static struct { + __u32 nr_iters; +} args = { + .nr_iters = 10, +}; + +enum { + ARG_NR_ITERS = 4000, +}; + +static const struct argp_option opts[] = { + { "nr_iters", ARG_NR_ITERS, "nr_iters", 0, + "Set number of iterations for the bpf_for_each helper"}, + {}, +}; + +static error_t parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case ARG_NR_ITERS: + args.nr_iters = strtol(arg, NULL, 10); + break; + default: + return ARGP_ERR_UNKNOWN; + } + + return 0; +} + +/* exported into benchmark runner */ +const struct argp bench_for_each_argp = { + .options = opts, + .parser = parse_arg, +}; + +static void validate(void) +{ + if (env.consumer_cnt != 1) { + fprintf(stderr, "benchmark doesn't support multi-consumer!\n"); + exit(1); + } +} + +static void *producer(void *input) +{ + while (true) + /* trigger the bpf program */ + syscall(__NR_getpgid); + + return NULL; +} + +static void *consumer(void *input) +{ + return NULL; +} + +static void measure(struct bench_res *res) +{ + res->hits = atomic_swap(&ctx.skel->bss->hits, 0); +} + +static void setup(void) +{ + struct bpf_link *link; + + setup_libbpf(); + + ctx.skel = for_each_helper__open_and_load(); + if (!ctx.skel) { + fprintf(stderr, "failed to open skeleton\n"); + exit(1); + } + + link = bpf_program__attach(ctx.skel->progs.benchmark); + if (!link) { + fprintf(stderr, "failed to attach program!\n"); + exit(1); + } + + ctx.skel->bss->nr_iterations = args.nr_iters; +} + +const struct bench bench_for_each_helper = { + .name = "for-each-helper", + .validate = validate, + .setup = setup, + .producer_thread = producer, + .consumer_thread = consumer, + .measure = measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; diff --git a/tools/testing/selftests/bpf/benchs/run_bench_for_each.sh b/tools/testing/selftests/bpf/benchs/run_bench_for_each.sh new file mode 100755 index 000000000000..5f11a1ad66d3 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/run_bench_for_each.sh @@ -0,0 +1,16 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source ./benchs/run_common.sh + +set -eufo pipefail + +for t in 1 4 8 12 16; do +printf "\n" +for i in 1 10 100 500 1000 5000 10000 50000 100000 500000 1000000; do +subtitle "nr_iterations: $i, nr_threads: $t" + summarize "bpf_for_each helper - total callbacks called: " \ + "$($RUN_BENCH -p $t --nr_iters $i for-each-helper)" + printf "\n" +done +done diff --git a/tools/testing/selftests/bpf/progs/for_each_helper.c b/tools/testing/selftests/bpf/progs/for_each_helper.c index 4404d0cb32a6..b95551d99f75 100644 --- a/tools/testing/selftests/bpf/progs/for_each_helper.c +++ b/tools/testing/selftests/bpf/progs/for_each_helper.c @@ -14,6 +14,8 @@ struct callback_ctx { u32 nr_iterations; u32 stop_index = -1; +long hits; + /* Making these global variables so that the userspace program * can verify the output through the skeleton */ @@ -67,3 +69,14 @@ int prog_invalid_flags(struct __sk_buff *skb) return 0; } + +SEC("fentry/__x64_sys_getpgid") +int benchmark(void *ctx) +{ + for (int i = 0; i < 1000; i++) { + bpf_for_each(nr_iterations, empty_callback_fn, NULL, 0); + + __sync_add_and_fetch(&hits, nr_iterations); + } + return 0; +}