diff mbox series

[bpf-next] selftests/bpf: Fix flaky test btf_map_in_map/lookup_update

Message ID 20240322050716.357983-1-yonghong.song@linux.dev (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: Fix flaky test btf_map_in_map/lookup_update | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 11 maintainers not CCed: haoluo@google.com linux-kselftest@vger.kernel.org john.fastabend@gmail.com eddyz87@gmail.com sdf@google.com song@kernel.org kpsingh@kernel.org shuah@kernel.org martin.lau@linux.dev jolsa@kernel.org mykolal@fb.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yonghong Song March 22, 2024, 5:07 a.m. UTC
Recently, I frequently hit the following test failure:

  [root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
  test_lookup_update:PASS:skel_open 0 nsec
  ...
  test_lookup_update:PASS:sync_rcu 0 nsec
  test_lookup_update:FAIL:map1_leak inner_map1 leaked!
  #33/1    btf_map_in_map/lookup_update:FAIL
  #33      btf_map_in_map:FAIL

In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.

But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.

By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/btf_map_in_map.c | 24 -------------------
 1 file changed, 24 deletions(-)

Comments

Yonghong Song March 22, 2024, 2:25 p.m. UTC | #1
On 3/21/24 10:07 PM, Yonghong Song wrote:
> Recently, I frequently hit the following test failure:
>
>    [root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
>    test_lookup_update:PASS:skel_open 0 nsec
>    ...
>    test_lookup_update:PASS:sync_rcu 0 nsec
>    test_lookup_update:FAIL:map1_leak inner_map1 leaked!
>    #33/1    btf_map_in_map/lookup_update:FAIL
>    #33      btf_map_in_map:FAIL
>
> In the test, after map is closed and then after two rcu grace periods,
> it is assumed that map_id is not available to user space.
>
> But the above assumption cannot be guaranteed. After zero or one
> or two rcu grace periods in different siturations, the actual
> freeing-map-work is put into a workqueue. Later on, when the work
> is dequeued, the map will be actually freed.
> See bpf_map_put() in kernel/bpf/syscall.c.
>
> By using workqueue, there is no ganrantee that map will be actually
> freed after a couple of rcu grace periods. This patch removed
> such map leak detection and then the test can pass consistently.

Please ignore this one. There is another patch with the same subject later but
with one more change to fix compilation error. The reason is I send this
patch but forget to fold a simple fix in. I waited for half an hour but
the email didn't reach me or mailing list. I do not know what is going
and thought it may have some issues. So I amended the fix and send
another one. It looks like the other patch appeared in the mailing list
too.

>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>   .../selftests/bpf/prog_tests/btf_map_in_map.c | 24 -------------------
>   1 file changed, 24 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> index a8b53b8736f0..5d389eb16295 100644
> --- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> +++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> @@ -102,30 +102,6 @@ static void test_lookup_update(void)
>   	CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n");
>   	CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n");
>   
> -	test_btf_map_in_map__destroy(skel);
> -	skel = NULL;
> -
> -	/* we need to either wait for or force synchronize_rcu(), before
> -	 * checking for "still exists" condition, otherwise map could still be
> -	 * resolvable by ID, causing false positives.
> -	 *
> -	 * Older kernels (5.8 and earlier) freed map only after two
> -	 * synchronize_rcu()s, so trigger two, to be entirely sure.
> -	 */
> -	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> -	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> -
> -	fd = bpf_map_get_fd_by_id(map1_id);
> -	if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) {
> -		close(fd);
> -		goto cleanup;
> -	}
> -	fd = bpf_map_get_fd_by_id(map2_id);
> -	if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) {
> -		close(fd);
> -		goto cleanup;
> -	}
> -
>   cleanup:
>   	test_btf_map_in_map__destroy(skel);
>   }
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
index a8b53b8736f0..5d389eb16295 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
@@ -102,30 +102,6 @@  static void test_lookup_update(void)
 	CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n");
 	CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n");
 
-	test_btf_map_in_map__destroy(skel);
-	skel = NULL;
-
-	/* we need to either wait for or force synchronize_rcu(), before
-	 * checking for "still exists" condition, otherwise map could still be
-	 * resolvable by ID, causing false positives.
-	 *
-	 * Older kernels (5.8 and earlier) freed map only after two
-	 * synchronize_rcu()s, so trigger two, to be entirely sure.
-	 */
-	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
-	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
-
-	fd = bpf_map_get_fd_by_id(map1_id);
-	if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) {
-		close(fd);
-		goto cleanup;
-	}
-	fd = bpf_map_get_fd_by_id(map2_id);
-	if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) {
-		close(fd);
-		goto cleanup;
-	}
-
 cleanup:
 	test_btf_map_in_map__destroy(skel);
 }