Message ID | 20230804073740.194770-4-xukuohai@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 90f0074cd9f9a24b7b6c4d5afffa676aee48c0e9 |
Delegated to: | BPF |
Headers | show |
Series | bug fixes for sockmap | expand |
Hi Xu, On 8/4/23 9:37 AM, Xu Kuohai wrote: > From: Xu Kuohai <xukuohai@huawei.com> > > BPF CI has reported the following failure: > > Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir > Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir > ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1506 > ./test_progs:vsock_unix_redir_connectible:1506: ingress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1506 > ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1506 > ./test_progs:vsock_unix_redir_connectible:1514: ingress: recv() err, errno=11 > vsock_unix_redir_connectible:FAIL:1514 > ./test_progs:vsock_unix_redir_connectible:1518: ingress: vsock socket map failed, a != b > vsock_unix_redir_connectible:FAIL:1518 > ./test_progs:vsock_unix_redir_connectible:1525: ingress: want pass count 1, have 0 > > It’s because the recv(... MSG_DONTWAIT) syscall in the test case is > called before the queued work sk_psock_backlog() in the kernel finishes > executing. So the data to be read is still queued in psock->ingress_skb > and cannot be read by the user program. Therefore, the non-blocking > recv() reads nothing and reports an EAGAIN error. > > So replace recv(... MSG_DONTWAIT) with xrecv_nonblock(), which calls > select() to wait for data to be readable or timeout before calls recv(). > > Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap") > Signed-off-by: Xu Kuohai <xukuohai@huawei.com> This is unfortunately still flaky and showing up from time to time in BPF CI, e.g. a very recent one can be found here: https://github.com/kernel-patches/bpf/actions/runs/6021475685/job/16335248421 [...] Error: #211 sockmap_listen Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501 ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501 ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1501 Could you continue to look into it to make the test more robust? Thanks a lot, Daniel
On 8/30/2023 4:10 PM, Daniel Borkmann wrote: > Hi Xu, > > On 8/4/23 9:37 AM, Xu Kuohai wrote: >> From: Xu Kuohai <xukuohai@huawei.com> >> >> BPF CI has reported the following failure: >> >> Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir >> Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir >> ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected >> vsock_unix_redir_connectible:FAIL:1506 >> ./test_progs:vsock_unix_redir_connectible:1506: ingress: write: Transport endpoint is not connected >> vsock_unix_redir_connectible:FAIL:1506 >> ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected >> vsock_unix_redir_connectible:FAIL:1506 >> ./test_progs:vsock_unix_redir_connectible:1514: ingress: recv() err, errno=11 >> vsock_unix_redir_connectible:FAIL:1514 >> ./test_progs:vsock_unix_redir_connectible:1518: ingress: vsock socket map failed, a != b >> vsock_unix_redir_connectible:FAIL:1518 >> ./test_progs:vsock_unix_redir_connectible:1525: ingress: want pass count 1, have 0 >> >> It’s because the recv(... MSG_DONTWAIT) syscall in the test case is >> called before the queued work sk_psock_backlog() in the kernel finishes >> executing. So the data to be read is still queued in psock->ingress_skb >> and cannot be read by the user program. Therefore, the non-blocking >> recv() reads nothing and reports an EAGAIN error. >> >> So replace recv(... MSG_DONTWAIT) with xrecv_nonblock(), which calls >> select() to wait for data to be readable or timeout before calls recv(). >> >> Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap") >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> > > This is unfortunately still flaky and showing up from time to time in BPF CI, e.g. a > very recent one can be found here: > > https://github.com/kernel-patches/bpf/actions/runs/6021475685/job/16335248421 > > [...] > Error: #211 sockmap_listen > Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir > Error: #211/79 sockmap_listen/sockmap VSOCK test_vsock_redir > ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1501 > ./test_progs:vsock_unix_redir_connectible:1501: ingress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1501 > ./test_progs:vsock_unix_redir_connectible:1501: egress: write: Transport endpoint is not connected > vsock_unix_redir_connectible:FAIL:1501 > > Could you continue to look into it to make the test more robust? > OK, it looks like I only noticed the recv failure and ignored the write failure. I'll take it a look. > Thanks a lot, > Daniel
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index b4f6f3a50ae5..ba35bcc66e7e 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1432,7 +1432,7 @@ static void vsock_unix_redir_connectible(int sock_mapfd, int verd_mapfd, if (n < 1) goto out; - n = recv(mode == REDIR_INGRESS ? u0 : u1, &b, sizeof(b), MSG_DONTWAIT); + n = xrecv_nonblock(mode == REDIR_INGRESS ? u0 : u1, &b, sizeof(b), 0); if (n < 0) FAIL("%s: recv() err, errno=%d", log_prefix, errno); if (n == 0)