diff mbox series

[bpf-next] bpf, sockmap: fix return codes from tcp_bpf_recvmsg_parser()

Message ID 20220104205918.286416-1-john.fastabend@gmail.com (mailing list archive)
State Accepted
Commit 5b2c5540b8110eea0d67a78fb0ddb9654c58daeb
Delegated to: BPF
Headers show
Series [bpf-next] bpf, sockmap: fix return codes from tcp_bpf_recvmsg_parser() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers fail 1 blamed authors not CCed: jakub@cloudflare.com; 12 maintainers not CCed: andrii@kernel.org kuba@kernel.org kpsingh@kernel.org kafai@fb.com lmb@cloudflare.com songliubraving@fb.com dsahern@kernel.org jakub@cloudflare.com edumazet@google.com yhs@fb.com yoshfuji@linux-ipv6.org davem@davemloft.net
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 39 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next success VM_Test

Commit Message

John Fastabend Jan. 4, 2022, 8:59 p.m. UTC
Applications can be confused slightly because we do not always return the
same error code as expected, e.g. what the TCP stack normally returns. For
example on a sock err sk->sk_err instead of returning the sock_error we
return EAGAIN. This usually means the application will 'try again'
instead of aborting immediately. Another example, when a shutdown event
is received we should immediately abort instead of waiting for data when
the user provides a timeout.

These tend to not be fatal, applications usually recover, but introduces
bogus errors to the user or introduces unexpected latency. Before
'c5d2177a72a16' we fell back to the TCP stack when no data was available
so we managed to catch many of the cases here, although with the extra
latency cost of calling tcp_msg_wait_data() first.

To fix lets duplicate the error handling in TCP stack into tcp_bpf so
that we get the same error codes.

These were found in our CI tests that run applications against sockmap
and do longer lived testing, at least compared to test_sockmap that
does short-lived ping/pong tests, and in some of our test clusters
we deploy.

Its non-trivial to do these in a shorter form CI tests that would be
appropriate for BPF selftests, but we are looking into it so we can
ensure this keeps working going forward. As a preview one idea is to
pull in the packetdrill testing which catches some of this.

Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

John Fastabend Jan. 4, 2022, 9:07 p.m. UTC | #1
John Fastabend wrote:
> Applications can be confused slightly because we do not always return the
> same error code as expected, e.g. what the TCP stack normally returns. For
> example on a sock err sk->sk_err instead of returning the sock_error we
> return EAGAIN. This usually means the application will 'try again'
> instead of aborting immediately. Another example, when a shutdown event
> is received we should immediately abort instead of waiting for data when
> the user provides a timeout.
> 
> These tend to not be fatal, applications usually recover, but introduces
> bogus errors to the user or introduces unexpected latency. Before
> 'c5d2177a72a16' we fell back to the TCP stack when no data was available
> so we managed to catch many of the cases here, although with the extra
> latency cost of calling tcp_msg_wait_data() first.
> 
> To fix lets duplicate the error handling in TCP stack into tcp_bpf so
> that we get the same error codes.
> 
> These were found in our CI tests that run applications against sockmap
> and do longer lived testing, at least compared to test_sockmap that
> does short-lived ping/pong tests, and in some of our test clusters
> we deploy.
> 
> Its non-trivial to do these in a shorter form CI tests that would be
> appropriate for BPF selftests, but we are looking into it so we can
> ensure this keeps working going forward. As a preview one idea is to
> pull in the packetdrill testing which catches some of this.
> 
> Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---

Forgot to add a note, I marked this for bpf-next given we are in rc8. It
is a fix though, but assume we only want critical things at this point.
Anyways it applies against bpf and bpf-next so can be applied in either
place.

Thanks,
John
patchwork-bot+netdevbpf@kernel.org Jan. 5, 2022, 7:50 p.m. UTC | #2
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Tue,  4 Jan 2022 12:59:18 -0800 you wrote:
> Applications can be confused slightly because we do not always return the
> same error code as expected, e.g. what the TCP stack normally returns. For
> example on a sock err sk->sk_err instead of returning the sock_error we
> return EAGAIN. This usually means the application will 'try again'
> instead of aborting immediately. Another example, when a shutdown event
> is received we should immediately abort instead of waiting for data when
> the user provides a timeout.
> 
> [...]

Here is the summary with links:
  - [bpf-next] bpf, sockmap: fix return codes from tcp_bpf_recvmsg_parser()
    https://git.kernel.org/bpf/bpf-next/c/5b2c5540b811

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index f70aa0932bd6..9b9b02052fd3 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -196,12 +196,39 @@  static int tcp_bpf_recvmsg_parser(struct sock *sk,
 		long timeo;
 		int data;
 
+		if (sock_flag(sk, SOCK_DONE))
+			goto out;
+
+		if (sk->sk_err) {
+			copied = sock_error(sk);
+			goto out;
+		}
+
+		if (sk->sk_shutdown & RCV_SHUTDOWN)
+			goto out;
+
+		if (sk->sk_state == TCP_CLOSE) {
+			copied = -ENOTCONN;
+			goto out;
+		}
+
 		timeo = sock_rcvtimeo(sk, nonblock);
+		if (!timeo) {
+			copied = -EAGAIN;
+			goto out;
+		}
+
+		if (signal_pending(current)) {
+			copied = sock_intr_errno(timeo);
+			goto out;
+		}
+
 		data = tcp_msg_wait_data(sk, psock, timeo);
 		if (data && !sk_psock_queue_empty(psock))
 			goto msg_bytes_ready;
 		copied = -EAGAIN;
 	}
+out:
 	release_sock(sk);
 	sk_psock_put(sk, psock);
 	return copied;