diff mbox series

[bpf] selftests/bpf: Poll for receive in cg_storage_multi test

Message ID 20230403215834.26675-1-zhuyifei@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf] selftests/bpf: Poll for receive in cg_storage_multi test | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers warning 9 maintainers not CCed: mykolal@fb.com song@kernel.org shuah@kernel.org haoluo@google.com yhs@fb.com john.fastabend@gmail.com kpsingh@kernel.org jolsa@kernel.org linux-kselftest@vger.kernel.org
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-10 pending Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-13 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-14 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-16 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-18 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-21 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-20 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-15 fail Logs for test_progs on s390x with gcc

Commit Message

YiFei Zhu April 3, 2023, 9:58 p.m. UTC
In some cases the loopback latency might be large enough, causing
the assertion on invocations to be run before ingress prog getting
executed. The assertion would fail and the test would flake.

This can be reliably reproduced by arbitrarily increaing the loopback
latency (thanks to [1]):
  tc qdisc add dev lo root handle 1: htb default 12
  tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
  tc qdisc add dev lo parent 1:12 netem delay 100ms

Fix this by polling on the receive end and waiting for up to a
second, instead of instantly returning to the assert.

[1] https://gist.github.com/kstevens715/4598301

Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Stanislav Fomichev April 3, 2023, 10:16 p.m. UTC | #1
On Mon, Apr 3, 2023 at 2:59 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> In some cases the loopback latency might be large enough, causing
> the assertion on invocations to be run before ingress prog getting
> executed. The assertion would fail and the test would flake.
>
> This can be reliably reproduced by arbitrarily increaing the loopback
> latency (thanks to [1]):
>   tc qdisc add dev lo root handle 1: htb default 12
>   tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
>   tc qdisc add dev lo parent 1:12 netem delay 100ms
>
> Fix this by polling on the receive end and waiting for up to a
> second, instead of instantly returning to the assert.
>
> [1] https://gist.github.com/kstevens715/4598301
>
> Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
> Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>

Thank you!

Acked-by: Stanislav Fomichev <sdf@google.com>

> ---
>  .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> index 621c57222191..3b0094a2a353 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> @@ -7,6 +7,7 @@
>  #include <test_progs.h>
>  #include <cgroup_helpers.h>
>  #include <network_helpers.h>
> +#include <poll.h>
>
>  #include "progs/cg_storage_multi.h"
>
> @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key)
>
>  static bool connect_send(const char *cgroup_path)
>  {
> -       bool res = true;
>         int server_fd = -1, client_fd = -1;
> +       struct pollfd pollfd;
> +       bool res = true;
>
>         if (join_cgroup(cgroup_path))
>                 goto out_clean;
> @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path)
>         if (send(client_fd, "message", strlen("message"), 0) < 0)
>                 goto out_clean;
>
> +       pollfd.fd = server_fd;
> +       pollfd.events = POLLIN;
> +       if (poll(&pollfd, 1, 1000) != 1)
> +               goto out_clean;
> +
>         res = false;
>
>  out_clean:
> --
> 2.40.0.348.gf938b09366-goog
>
Martin KaFai Lau April 3, 2023, 11:44 p.m. UTC | #2
On 4/3/23 2:58 PM, YiFei Zhu wrote:
> In some cases the loopback latency might be large enough, causing
> the assertion on invocations to be run before ingress prog getting
> executed. The assertion would fail and the test would flake.
> 
> This can be reliably reproduced by arbitrarily increaing the loopback
> latency (thanks to [1]):
>    tc qdisc add dev lo root handle 1: htb default 12
>    tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
>    tc qdisc add dev lo parent 1:12 netem delay 100ms
> 
> Fix this by polling on the receive end and waiting for up to a
> second, instead of instantly returning to the assert.
> 
> [1] https://gist.github.com/kstevens715/4598301
> 
> Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
> Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> ---
>   .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> index 621c57222191..3b0094a2a353 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> @@ -7,6 +7,7 @@
>   #include <test_progs.h>
>   #include <cgroup_helpers.h>
>   #include <network_helpers.h>
> +#include <poll.h>
>   
>   #include "progs/cg_storage_multi.h"
>   
> @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key)
>   
>   static bool connect_send(const char *cgroup_path)
>   {
> -	bool res = true;
>   	int server_fd = -1, client_fd = -1;
> +	struct pollfd pollfd;
> +	bool res = true;
>   
>   	if (join_cgroup(cgroup_path))
>   		goto out_clean;
> @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path)
>   	if (send(client_fd, "message", strlen("message"), 0) < 0)
>   		goto out_clean;
>   
> +	pollfd.fd = server_fd;
> +	pollfd.events = POLLIN;
> +	if (poll(&pollfd, 1, 1000) != 1)
> +		goto out_clean;

Thanks for the fix. The slowness explanation makes sense.

A nit. All start_server() has a 3s SO_RCVTIMEO by default. How about a read() 
here instead of a poll(). Easier to change the default read timeout for all 
tests if needed.
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
index 621c57222191..3b0094a2a353 100644
--- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
@@ -7,6 +7,7 @@ 
 #include <test_progs.h>
 #include <cgroup_helpers.h>
 #include <network_helpers.h>
+#include <poll.h>
 
 #include "progs/cg_storage_multi.h"
 
@@ -56,8 +57,9 @@  static bool assert_storage_noexist(struct bpf_map *map, const void *key)
 
 static bool connect_send(const char *cgroup_path)
 {
-	bool res = true;
 	int server_fd = -1, client_fd = -1;
+	struct pollfd pollfd;
+	bool res = true;
 
 	if (join_cgroup(cgroup_path))
 		goto out_clean;
@@ -73,6 +75,11 @@  static bool connect_send(const char *cgroup_path)
 	if (send(client_fd, "message", strlen("message"), 0) < 0)
 		goto out_clean;
 
+	pollfd.fd = server_fd;
+	pollfd.events = POLLIN;
+	if (poll(&pollfd, 1, 1000) != 1)
+		goto out_clean;
+
 	res = false;
 
 out_clean: