diff mbox series

[bpf-next] selftests/bpf: fix a CI failure caused by vsock write

Message ID 20230831013105.2930824-1-xukuohai@huaweicloud.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: fix a CI failure caused by vsock write | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers fail 2 blamed authors not CCed: sgarzare@redhat.com davem@davemloft.net; 17 maintainers not CCed: jakub@cloudflare.com kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com xukuohai@huawei.com andrii@kernel.org yonghong.song@linux.dev shuah@kernel.org mykolal@fb.com sgarzare@redhat.com davem@davemloft.net linux-kselftest@vger.kernel.org song@kernel.org jolsa@kernel.org haoluo@google.com ast@kernel.org
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 46 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc

Commit Message

Xu Kuohai Aug. 31, 2023, 1:31 a.m. UTC
From: Xu Kuohai <xukuohai@huawei.com>

While commit 90f0074cd9f9 ("selftests/bpf: fix a CI failure caused by vsock sockmap test")
fixes a receive failure of vsock sockmap test, there is still a write failure:

Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
  ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
  vsock_unix_redir_connectible:FAIL:1501
  ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected
  vsock_unix_redir_connectible:FAIL:1501
  ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
  vsock_unix_redir_connectible:FAIL:1501

The reason is that the vsock connection in the test is set to ESTABLISHED state
by function virtio_transport_recv_pkt, which is executed in a workqueue thread,
so when the user space test thread runs before the workqueue thread, this
problem occurs.

To fix it, before writing the connection, wait for it to be connected.

Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
---
 .../bpf/prog_tests/sockmap_helpers.h          | 29 +++++++++++++++++++
 .../selftests/bpf/prog_tests/sockmap_listen.c |  5 ++++
 2 files changed, 34 insertions(+)

Comments

Daniel Borkmann Aug. 31, 2023, 12:09 p.m. UTC | #1
On 8/31/23 3:31 AM, Xu Kuohai wrote:
> From: Xu Kuohai <xukuohai@huawei.com>
> 
> While commit 90f0074cd9f9 ("selftests/bpf: fix a CI failure caused by vsock sockmap test")
> fixes a receive failure of vsock sockmap test, there is still a write failure:
> 
> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>    ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>    vsock_unix_redir_connectible:FAIL:1501
>    ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected
>    vsock_unix_redir_connectible:FAIL:1501
>    ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>    vsock_unix_redir_connectible:FAIL:1501
> 
> The reason is that the vsock connection in the test is set to ESTABLISHED state
> by function virtio_transport_recv_pkt, which is executed in a workqueue thread,
> so when the user space test thread runs before the workqueue thread, this
> problem occurs.
> 
> To fix it, before writing the connection, wait for it to be connected.
> 
> Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap")
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>

Thanks for the fix! Looks like this is gone now at least in the tests which succeed,
but there are still two issues:

1) s390x fails in BPF CI as below:

https://github.com/kernel-patches/bpf/actions/runs/6031993528/job/16366784236

Error: #211 sockmap_listen
Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
   Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
Error: #211/158 sockmap_listen/sockhash VSOCK test_vsock_redir
   Error: #211/158 sockmap_listen/sockhash VSOCK test_vsock_redir
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494
   ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
   vsock_socketpair_connectible:FAIL:1456
   ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
   vsock_unix_redir_connectible:FAIL:1494

2) Various panics, some GPFs but also seen NULL pointer derefs, discussed in the other
    thread: https://lore.kernel.org/bpf/ZO+RQwJhPhYcNGAi@krava/

I believe issue 1) might still be related to your fix in here, ptal.

Thanks,
Daniel
Xu Kuohai Aug. 31, 2023, 12:40 p.m. UTC | #2
On 8/31/2023 8:09 PM, Daniel Borkmann wrote:
> On 8/31/23 3:31 AM, Xu Kuohai wrote:
>> From: Xu Kuohai <xukuohai@huawei.com>
>>
>> While commit 90f0074cd9f9 ("selftests/bpf: fix a CI failure caused by vsock sockmap test")
>> fixes a receive failure of vsock sockmap test, there is still a write failure:
>>
>> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>>    ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>>    vsock_unix_redir_connectible:FAIL:1501
>>    ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected
>>    vsock_unix_redir_connectible:FAIL:1501
>>    ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>>    vsock_unix_redir_connectible:FAIL:1501
>>
>> The reason is that the vsock connection in the test is set to ESTABLISHED state
>> by function virtio_transport_recv_pkt, which is executed in a workqueue thread,
>> so when the user space test thread runs before the workqueue thread, this
>> problem occurs.
>>
>> To fix it, before writing the connection, wait for it to be connected.
>>
>> Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap")
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> 
> Thanks for the fix! Looks like this is gone now at least in the tests which succeed,
> but there are still two issues:
> 
> 1) s390x fails in BPF CI as below:
> 
> https://github.com/kernel-patches/bpf/actions/runs/6031993528/job/16366784236
> 
> Error: #211 sockmap_listen
> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>    Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
> Error: #211/158 sockmap_listen/sockhash VSOCK test_vsock_redir
>    Error: #211/158 sockmap_listen/sockhash VSOCK test_vsock_redir
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
>    ./test_progs:vsock_socketpair_connectible:1456: poll_connect: Invalid argument
>    vsock_socketpair_connectible:FAIL:1456
>    ./test_progs:vsock_unix_redir_connectible:1494: vsock_socketpair_connectible() failed
>    vsock_unix_redir_connectible:FAIL:1494
> 

Oops, I think it's because the esize variable is not initialized,
causing getsockopt to read a garbage value.

> 2) Various panics, some GPFs but also seen NULL pointer derefs, discussed in the other
>     thread: https://lore.kernel.org/bpf/ZO+RQwJhPhYcNGAi@krava/
>

still debugging ...

> I believe issue 1) might still be related to your fix in here, ptal.
> 
Sorry for introducing issue 1), will post a fix soon.

> Thanks,
> Daniel
> 
> .
Eduard Zingerman Aug. 31, 2023, 12:58 p.m. UTC | #3
On Thu, 2023-08-31 at 09:31 +0800, Xu Kuohai wrote:
> From: Xu Kuohai <xukuohai@huawei.com>
> 
> While commit 90f0074cd9f9 ("selftests/bpf: fix a CI failure caused by vsock sockmap test")
> fixes a receive failure of vsock sockmap test, there is still a write failure:
> 
> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
> Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir
>   ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>   vsock_unix_redir_connectible:FAIL:1501
>   ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected
>   vsock_unix_redir_connectible:FAIL:1501
>   ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected
>   vsock_unix_redir_connectible:FAIL:1501
> 
> The reason is that the vsock connection in the test is set to ESTABLISHED state
> by function virtio_transport_recv_pkt, which is executed in a workqueue thread,
> so when the user space test thread runs before the workqueue thread, this
> problem occurs.
> 
> To fix it, before writing the connection, wait for it to be connected.

Fun fact:
while trying this patch I hit an oops [1]. I'm currently trying to
bisect the commit causing this oops and have the following observation:
- good revisions don't have this test as flip-flop
- bad revisions have this test as flip-flop.

[1] https://lore.kernel.org/bpf/de816b89073544deb2ce34c4b242d583a6d4660f.camel@gmail.com/

> 
> Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap")
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> ---
>  .../bpf/prog_tests/sockmap_helpers.h          | 29 +++++++++++++++++++
>  .../selftests/bpf/prog_tests/sockmap_listen.c |  5 ++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
> index d12665490a90..837dfb0361c6 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
> @@ -179,6 +179,35 @@
>  		__ret;                                                         \
>  	})
>  
> +static inline int poll_connect(int fd, unsigned int timeout_sec)
> +{
> +	struct timeval timeout = { .tv_sec = timeout_sec };
> +	fd_set wfds;
> +	int r;
> +	int eval;
> +	socklen_t esize;
> +
> +	FD_ZERO(&wfds);
> +	FD_SET(fd, &wfds);
> +
> +	r = select(fd + 1, NULL, &wfds, NULL, &timeout);
> +	if (r == 0)
> +		errno = ETIME;
> +
> +	if (r != 1)
> +		return -1;
> +
> +	if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &eval, &esize) < 0)
> +		return -1;
> +
> +	if (eval != 0) {
> +		errno = eval;
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  static inline int poll_read(int fd, unsigned int timeout_sec)
>  {
>  	struct timeval timeout = { .tv_sec = timeout_sec };
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> index 5674a9d0cacf..2d3bf38677b6 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> @@ -1452,6 +1452,11 @@ static int vsock_socketpair_connectible(int sotype, int *v0, int *v1)
>  	if (p < 0)
>  		goto close_cli;
>  
> +	if (poll_connect(c, IO_TIMEOUT_SEC) < 0) {
> +		FAIL_ERRNO("poll_connect");
> +		goto close_cli;
> +	}
> +
>  	*v0 = p;
>  	*v1 = c;
>
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
index d12665490a90..837dfb0361c6 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
@@ -179,6 +179,35 @@ 
 		__ret;                                                         \
 	})
 
+static inline int poll_connect(int fd, unsigned int timeout_sec)
+{
+	struct timeval timeout = { .tv_sec = timeout_sec };
+	fd_set wfds;
+	int r;
+	int eval;
+	socklen_t esize;
+
+	FD_ZERO(&wfds);
+	FD_SET(fd, &wfds);
+
+	r = select(fd + 1, NULL, &wfds, NULL, &timeout);
+	if (r == 0)
+		errno = ETIME;
+
+	if (r != 1)
+		return -1;
+
+	if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &eval, &esize) < 0)
+		return -1;
+
+	if (eval != 0) {
+		errno = eval;
+		return -1;
+	}
+
+	return 0;
+}
+
 static inline int poll_read(int fd, unsigned int timeout_sec)
 {
 	struct timeval timeout = { .tv_sec = timeout_sec };
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 5674a9d0cacf..2d3bf38677b6 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1452,6 +1452,11 @@  static int vsock_socketpair_connectible(int sotype, int *v0, int *v1)
 	if (p < 0)
 		goto close_cli;
 
+	if (poll_connect(c, IO_TIMEOUT_SEC) < 0) {
+		FAIL_ERRNO("poll_connect");
+		goto close_cli;
+	}
+
 	*v0 = p;
 	*v1 = c;