Message ID | 20220727182955.4044988-1-deso@posteo.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 639de43ef0dda165441af400ecb372e16b7f9354 |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout | expand |
On Wed, Jul 27, 2022 at 06:29:55PM +0000, Daniel Müller wrote: > The send_signal/send_signal_tracepoint is pretty flaky, with at least > one failure in every ten runs on a few attempts I've tried it: > > test_send_signal_common:PASS:pipe_c2p 0 nsec > > test_send_signal_common:PASS:pipe_p2c 0 nsec > > test_send_signal_common:PASS:fork 0 nsec > > test_send_signal_common:PASS:skel_open_and_load 0 nsec > > test_send_signal_common:PASS:skel_attach 0 nsec > > test_send_signal_common:PASS:pipe_read 0 nsec > > test_send_signal_common:PASS:pipe_write 0 nsec > > test_send_signal_common:PASS:reading pipe 0 nsec > > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec > > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50 > > test_send_signal_common:PASS:pipe_write 0 nsec > > #139/1 send_signal/send_signal_tracepoint:FAIL > > The reason does not appear to be a correctness issue in the strict > sense. Rather, we merely do not receive the signal we are waiting for > within the provided timeout. > Let's bump the timeout by a factor of ten. With that change I have not > been able to reproduce the failure in 150+ iterations. I am also sneaking > in a small simplification to the test_progs test selection logic. > > Signed-off-by: Daniel Müller <deso@posteo.net> I reproduced the fail, can't reproduce anymore with the fix Acked-by: Jiri Olsa <jolsa@kernel.org> jirka > --- > tools/testing/selftests/bpf/prog_tests/send_signal.c | 2 +- > tools/testing/selftests/bpf/test_progs.c | 7 ++----- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c > index d71226e..d63a20 100644 > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c > @@ -64,7 +64,7 @@ static void test_send_signal_common(struct perf_event_attr *attr, > ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); > > /* wait a little for signal handler */ > - for (int i = 0; i < 100000000 && !sigusr1_received; i++) > + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) > j /= i + j + 1; > > buf[0] = sigusr1_received ? '2' : '0'; > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c > index c639f2e..3561c9 100644 > --- a/tools/testing/selftests/bpf/test_progs.c > +++ b/tools/testing/selftests/bpf/test_progs.c > @@ -1604,11 +1604,8 @@ int main(int argc, char **argv) > struct prog_test_def *test = &prog_test_defs[i]; > > test->test_num = i + 1; > - if (should_run(&env.test_selector, > - test->test_num, test->test_name)) > - test->should_run = true; > - else > - test->should_run = false; > + test->should_run = should_run(&env.test_selector, > + test->test_num, test->test_name); > > if ((test->run_test == NULL && test->run_serial_test == NULL) || > (test->run_test != NULL && test->run_serial_test != NULL)) { > -- > 2.30.2 >
On 7/27/22 11:29 AM, Daniel Müller wrote: > The send_signal/send_signal_tracepoint is pretty flaky, with at least > one failure in every ten runs on a few attempts I've tried it: > > test_send_signal_common:PASS:pipe_c2p 0 nsec > > test_send_signal_common:PASS:pipe_p2c 0 nsec > > test_send_signal_common:PASS:fork 0 nsec > > test_send_signal_common:PASS:skel_open_and_load 0 nsec > > test_send_signal_common:PASS:skel_attach 0 nsec > > test_send_signal_common:PASS:pipe_read 0 nsec > > test_send_signal_common:PASS:pipe_write 0 nsec > > test_send_signal_common:PASS:reading pipe 0 nsec > > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec > > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50 > > test_send_signal_common:PASS:pipe_write 0 nsec > > #139/1 send_signal/send_signal_tracepoint:FAIL > > The reason does not appear to be a correctness issue in the strict > sense. Rather, we merely do not receive the signal we are waiting for > within the provided timeout. > Let's bump the timeout by a factor of ten. With that change I have not > been able to reproduce the failure in 150+ iterations. I am also sneaking > in a small simplification to the test_progs test selection logic. > > Signed-off-by: Daniel Müller <deso@posteo.net> Okay, this test has been improved *multiple* times to address its flakiness. We tried very hard not to increase the runtime for it so we don't increase overall test_progs run time. But looks like we have to do it to make it robust. Hopefully such a 10x number of iterations can finally address the flakiness issue. Acked-by: Yonghong Song <yhs@fb.com> > --- > tools/testing/selftests/bpf/prog_tests/send_signal.c | 2 +- > tools/testing/selftests/bpf/test_progs.c | 7 ++----- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c > index d71226e..d63a20 100644 > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c > @@ -64,7 +64,7 @@ static void test_send_signal_common(struct perf_event_attr *attr, > ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); > > /* wait a little for signal handler */ > - for (int i = 0; i < 100000000 && !sigusr1_received; i++) > + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) > j /= i + j + 1; > > buf[0] = sigusr1_received ? '2' : '0'; > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c > index c639f2e..3561c9 100644 > --- a/tools/testing/selftests/bpf/test_progs.c > +++ b/tools/testing/selftests/bpf/test_progs.c > @@ -1604,11 +1604,8 @@ int main(int argc, char **argv) > struct prog_test_def *test = &prog_test_defs[i]; > > test->test_num = i + 1; > - if (should_run(&env.test_selector, > - test->test_num, test->test_name)) > - test->should_run = true; > - else > - test->should_run = false; > + test->should_run = should_run(&env.test_selector, > + test->test_num, test->test_name); > > if ((test->run_test == NULL && test->run_serial_test == NULL) || > (test->run_test != NULL && test->run_serial_test != NULL)) {
Hello: This patch was applied to bpf/bpf-next.git (master) by Andrii Nakryiko <andrii@kernel.org>: On Wed, 27 Jul 2022 18:29:55 +0000 you wrote: > The send_signal/send_signal_tracepoint is pretty flaky, with at least > one failure in every ten runs on a few attempts I've tried it: > > test_send_signal_common:PASS:pipe_c2p 0 nsec > > test_send_signal_common:PASS:pipe_p2c 0 nsec > > test_send_signal_common:PASS:fork 0 nsec > > test_send_signal_common:PASS:skel_open_and_load 0 nsec > > test_send_signal_common:PASS:skel_attach 0 nsec > > test_send_signal_common:PASS:pipe_read 0 nsec > > test_send_signal_common:PASS:pipe_write 0 nsec > > test_send_signal_common:PASS:reading pipe 0 nsec > > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec > > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50 > > test_send_signal_common:PASS:pipe_write 0 nsec > > #139/1 send_signal/send_signal_tracepoint:FAIL > > [...] Here is the summary with links: - [bpf-next] selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout https://git.kernel.org/bpf/bpf-next/c/639de43ef0dd You are awesome, thank you!
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c index d71226e..d63a20 100644 --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -64,7 +64,7 @@ static void test_send_signal_common(struct perf_event_attr *attr, ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); /* wait a little for signal handler */ - for (int i = 0; i < 100000000 && !sigusr1_received; i++) + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) j /= i + j + 1; buf[0] = sigusr1_received ? '2' : '0'; diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index c639f2e..3561c9 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -1604,11 +1604,8 @@ int main(int argc, char **argv) struct prog_test_def *test = &prog_test_defs[i]; test->test_num = i + 1; - if (should_run(&env.test_selector, - test->test_num, test->test_name)) - test->should_run = true; - else - test->should_run = false; + test->should_run = should_run(&env.test_selector, + test->test_num, test->test_name); if ((test->run_test == NULL && test->run_serial_test == NULL) || (test->run_test != NULL && test->run_serial_test != NULL)) {