diff mbox series

[bpf-next] selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout

Message ID 20220727182955.4044988-1-deso@posteo.net (mailing list archive)
State Accepted
Commit 639de43ef0dda165441af400ecb372e16b7f9354
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 11 maintainers not CCed: haoluo@google.com sdf@google.com john.fastabend@gmail.com jolsa@kernel.org shuah@kernel.org yhs@fb.com martin.lau@linux.dev kpsingh@kernel.org linux-kselftest@vger.kernel.org mykolal@fb.com song@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on Array with gcc
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for Kernel LATEST on Array with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Kernel LATEST on Array with llvm-15

Commit Message

Daniel Müller July 27, 2022, 6:29 p.m. UTC
The send_signal/send_signal_tracepoint is pretty flaky, with at least
one failure in every ten runs on a few attempts I've tried it:
  > test_send_signal_common:PASS:pipe_c2p 0 nsec
  > test_send_signal_common:PASS:pipe_p2c 0 nsec
  > test_send_signal_common:PASS:fork 0 nsec
  > test_send_signal_common:PASS:skel_open_and_load 0 nsec
  > test_send_signal_common:PASS:skel_attach 0 nsec
  > test_send_signal_common:PASS:pipe_read 0 nsec
  > test_send_signal_common:PASS:pipe_write 0 nsec
  > test_send_signal_common:PASS:reading pipe 0 nsec
  > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
  > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
  > test_send_signal_common:PASS:pipe_write 0 nsec
  > #139/1   send_signal/send_signal_tracepoint:FAIL

The reason does not appear to be a correctness issue in the strict
sense. Rather, we merely do not receive the signal we are waiting for
within the provided timeout.
Let's bump the timeout by a factor of ten. With that change I have not
been able to reproduce the failure in 150+ iterations. I am also sneaking
in a small simplification to the test_progs test selection logic.

Signed-off-by: Daniel Müller <deso@posteo.net>
---
 tools/testing/selftests/bpf/prog_tests/send_signal.c | 2 +-
 tools/testing/selftests/bpf/test_progs.c             | 7 ++-----
 2 files changed, 3 insertions(+), 6 deletions(-)

Comments

Jiri Olsa July 28, 2022, 1:58 p.m. UTC | #1
On Wed, Jul 27, 2022 at 06:29:55PM +0000, Daniel Müller wrote:
> The send_signal/send_signal_tracepoint is pretty flaky, with at least
> one failure in every ten runs on a few attempts I've tried it:
>   > test_send_signal_common:PASS:pipe_c2p 0 nsec
>   > test_send_signal_common:PASS:pipe_p2c 0 nsec
>   > test_send_signal_common:PASS:fork 0 nsec
>   > test_send_signal_common:PASS:skel_open_and_load 0 nsec
>   > test_send_signal_common:PASS:skel_attach 0 nsec
>   > test_send_signal_common:PASS:pipe_read 0 nsec
>   > test_send_signal_common:PASS:pipe_write 0 nsec
>   > test_send_signal_common:PASS:reading pipe 0 nsec
>   > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
>   > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
>   > test_send_signal_common:PASS:pipe_write 0 nsec
>   > #139/1   send_signal/send_signal_tracepoint:FAIL
> 
> The reason does not appear to be a correctness issue in the strict
> sense. Rather, we merely do not receive the signal we are waiting for
> within the provided timeout.
> Let's bump the timeout by a factor of ten. With that change I have not
> been able to reproduce the failure in 150+ iterations. I am also sneaking
> in a small simplification to the test_progs test selection logic.
> 
> Signed-off-by: Daniel Müller <deso@posteo.net>

I reproduced the fail, can't reproduce anymore with the fix

Acked-by: Jiri Olsa <jolsa@kernel.org>

jirka

> ---
>  tools/testing/selftests/bpf/prog_tests/send_signal.c | 2 +-
>  tools/testing/selftests/bpf/test_progs.c             | 7 ++-----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index d71226e..d63a20 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -64,7 +64,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>  		ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
>  
>  		/* wait a little for signal handler */
> -		for (int i = 0; i < 100000000 && !sigusr1_received; i++)
> +		for (int i = 0; i < 1000000000 && !sigusr1_received; i++)
>  			j /= i + j + 1;
>  
>  		buf[0] = sigusr1_received ? '2' : '0';
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index c639f2e..3561c9 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -1604,11 +1604,8 @@ int main(int argc, char **argv)
>  		struct prog_test_def *test = &prog_test_defs[i];
>  
>  		test->test_num = i + 1;
> -		if (should_run(&env.test_selector,
> -				test->test_num, test->test_name))
> -			test->should_run = true;
> -		else
> -			test->should_run = false;
> +		test->should_run = should_run(&env.test_selector,
> +					      test->test_num, test->test_name);
>  
>  		if ((test->run_test == NULL && test->run_serial_test == NULL) ||
>  		    (test->run_test != NULL && test->run_serial_test != NULL)) {
> -- 
> 2.30.2
>
Yonghong Song July 28, 2022, 5:28 p.m. UTC | #2
On 7/27/22 11:29 AM, Daniel Müller wrote:
> The send_signal/send_signal_tracepoint is pretty flaky, with at least
> one failure in every ten runs on a few attempts I've tried it:
>    > test_send_signal_common:PASS:pipe_c2p 0 nsec
>    > test_send_signal_common:PASS:pipe_p2c 0 nsec
>    > test_send_signal_common:PASS:fork 0 nsec
>    > test_send_signal_common:PASS:skel_open_and_load 0 nsec
>    > test_send_signal_common:PASS:skel_attach 0 nsec
>    > test_send_signal_common:PASS:pipe_read 0 nsec
>    > test_send_signal_common:PASS:pipe_write 0 nsec
>    > test_send_signal_common:PASS:reading pipe 0 nsec
>    > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
>    > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
>    > test_send_signal_common:PASS:pipe_write 0 nsec
>    > #139/1   send_signal/send_signal_tracepoint:FAIL
> 
> The reason does not appear to be a correctness issue in the strict
> sense. Rather, we merely do not receive the signal we are waiting for
> within the provided timeout.
> Let's bump the timeout by a factor of ten. With that change I have not
> been able to reproduce the failure in 150+ iterations. I am also sneaking
> in a small simplification to the test_progs test selection logic.
> 
> Signed-off-by: Daniel Müller <deso@posteo.net>

Okay, this test has been improved *multiple* times to address its
flakiness. We tried very hard not to increase the runtime for it
so we don't increase overall test_progs run time. But looks like
we have to do it to make it robust. Hopefully such a 10x number
of iterations can finally address the flakiness issue.

Acked-by: Yonghong Song <yhs@fb.com>

> ---
>   tools/testing/selftests/bpf/prog_tests/send_signal.c | 2 +-
>   tools/testing/selftests/bpf/test_progs.c             | 7 ++-----
>   2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index d71226e..d63a20 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -64,7 +64,7 @@ static void test_send_signal_common(struct perf_event_attr *attr,
>   		ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
>   
>   		/* wait a little for signal handler */
> -		for (int i = 0; i < 100000000 && !sigusr1_received; i++)
> +		for (int i = 0; i < 1000000000 && !sigusr1_received; i++)
>   			j /= i + j + 1;
>   
>   		buf[0] = sigusr1_received ? '2' : '0';
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index c639f2e..3561c9 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -1604,11 +1604,8 @@ int main(int argc, char **argv)
>   		struct prog_test_def *test = &prog_test_defs[i];
>   
>   		test->test_num = i + 1;
> -		if (should_run(&env.test_selector,
> -				test->test_num, test->test_name))
> -			test->should_run = true;
> -		else
> -			test->should_run = false;
> +		test->should_run = should_run(&env.test_selector,
> +					      test->test_num, test->test_name);
>   
>   		if ((test->run_test == NULL && test->run_serial_test == NULL) ||
>   		    (test->run_test != NULL && test->run_serial_test != NULL)) {
patchwork-bot+netdevbpf@kernel.org July 29, 2022, 6:20 p.m. UTC | #3
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Wed, 27 Jul 2022 18:29:55 +0000 you wrote:
> The send_signal/send_signal_tracepoint is pretty flaky, with at least
> one failure in every ten runs on a few attempts I've tried it:
>   > test_send_signal_common:PASS:pipe_c2p 0 nsec
>   > test_send_signal_common:PASS:pipe_p2c 0 nsec
>   > test_send_signal_common:PASS:fork 0 nsec
>   > test_send_signal_common:PASS:skel_open_and_load 0 nsec
>   > test_send_signal_common:PASS:skel_attach 0 nsec
>   > test_send_signal_common:PASS:pipe_read 0 nsec
>   > test_send_signal_common:PASS:pipe_write 0 nsec
>   > test_send_signal_common:PASS:reading pipe 0 nsec
>   > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
>   > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
>   > test_send_signal_common:PASS:pipe_write 0 nsec
>   > #139/1   send_signal/send_signal_tracepoint:FAIL
> 
> [...]

Here is the summary with links:
  - [bpf-next] selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout
    https://git.kernel.org/bpf/bpf-next/c/639de43ef0dd

You are awesome, thank you!
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index d71226e..d63a20 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -64,7 +64,7 @@  static void test_send_signal_common(struct perf_event_attr *attr,
 		ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read");
 
 		/* wait a little for signal handler */
-		for (int i = 0; i < 100000000 && !sigusr1_received; i++)
+		for (int i = 0; i < 1000000000 && !sigusr1_received; i++)
 			j /= i + j + 1;
 
 		buf[0] = sigusr1_received ? '2' : '0';
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index c639f2e..3561c9 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1604,11 +1604,8 @@  int main(int argc, char **argv)
 		struct prog_test_def *test = &prog_test_defs[i];
 
 		test->test_num = i + 1;
-		if (should_run(&env.test_selector,
-				test->test_num, test->test_name))
-			test->should_run = true;
-		else
-			test->should_run = false;
+		test->should_run = should_run(&env.test_selector,
+					      test->test_num, test->test_name);
 
 		if ((test->run_test == NULL && test->run_serial_test == NULL) ||
 		    (test->run_test != NULL && test->run_serial_test != NULL)) {