diff mbox series

[bpf,v2,04/12] bpf: sockmap, handle fin correctly

Message ID 20230327175446.98151-5-john.fastabend@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf sockmap fixes | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR fail merge-conflict
netdev/tree_selection success Clearly marked for bpf, async
netdev/apply fail Patch does not apply to bpf
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-10 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-13 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-14 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-15 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-16 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-18 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-20 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-21 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16

Commit Message

John Fastabend March 27, 2023, 5:54 p.m. UTC
The sockmap code is returning EAGAIN after a FIN packet is received and no
more data is on the receive queue. Correct behavior is to return 0 to the
user and the user can then close the socket. The EAGAIN causes many apps
to retry which masks the problem. Eventually the socket is evicted from
the sockmap because its released from sockmap sock free handling. The
issue creates a delay and can cause some errors on application side.

To fix this check on sk_msg_recvmsg side if length is zero and FIN flag
is set then set return to zero. A selftest will be added to check this
condition.

Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Tested-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Jakub Sitnicki April 3, 2023, 11:11 a.m. UTC | #1
On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote:
> The sockmap code is returning EAGAIN after a FIN packet is received and no
> more data is on the receive queue. Correct behavior is to return 0 to the
> user and the user can then close the socket. The EAGAIN causes many apps
> to retry which masks the problem. Eventually the socket is evicted from
> the sockmap because its released from sockmap sock free handling. The
> issue creates a delay and can cause some errors on application side.
>
> To fix this check on sk_msg_recvmsg side if length is zero and FIN flag
> is set then set return to zero. A selftest will be added to check this
> condition.
>
> Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> Tested-by: William Findlay <will@isovalent.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index cf26d65ca389..3a0f43f3afd8 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c

[...]

> @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
>  	lock_sock(sk);
>  msg_bytes_ready:
>  	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
> +	/* The typical case for EFAULT is the socket was gracefully
> +	 * shutdown with a FIN pkt. So check here the other case is
> +	 * some error on copy_page_to_iter which would be unexpected.
> +	 * On fin return correct return code to zero.
> +	 */
> +	if (copied == -EFAULT) {
> +		bool is_fin = is_next_msg_fin(psock);
> +
> +		if (is_fin) {
> +			copied = 0;
> +			goto out;
> +		}
> +	}
>  	if (!copied) {
>  		long timeo;
>  		int data;

tcp_bpf_recvmsg needs a similar fix, no?
John Fastabend April 3, 2023, 9:05 p.m. UTC | #2
Jakub Sitnicki wrote:
> On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote:
> > The sockmap code is returning EAGAIN after a FIN packet is received and no
> > more data is on the receive queue. Correct behavior is to return 0 to the
> > user and the user can then close the socket. The EAGAIN causes many apps
> > to retry which masks the problem. Eventually the socket is evicted from
> > the sockmap because its released from sockmap sock free handling. The
> > issue creates a delay and can cause some errors on application side.
> >
> > To fix this check on sk_msg_recvmsg side if length is zero and FIN flag
> > is set then set return to zero. A selftest will be added to check this
> > condition.
> >
> > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> > Tested-by: William Findlay <will@isovalent.com>
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >  net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > index cf26d65ca389..3a0f43f3afd8 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> 
> [...]
> 
> > @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
> >  	lock_sock(sk);
> >  msg_bytes_ready:
> >  	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
> > +	/* The typical case for EFAULT is the socket was gracefully
> > +	 * shutdown with a FIN pkt. So check here the other case is
> > +	 * some error on copy_page_to_iter which would be unexpected.
> > +	 * On fin return correct return code to zero.
> > +	 */
> > +	if (copied == -EFAULT) {
> > +		bool is_fin = is_next_msg_fin(psock);
> > +
> > +		if (is_fin) {
> > +			copied = 0;
> > +			goto out;
> > +		}
> > +	}
> >  	if (!copied) {
> >  		long timeo;
> >  		int data;
> 
> tcp_bpf_recvmsg needs a similar fix, no?

Yes, I had lumped it in with follow up fixes needed for the
stream parser case but your right its not related.

Mind if I do it in a follow up? Or if I need to do a v4 I'll
roll it in there.

Thanks!
John
Jakub Sitnicki April 4, 2023, 10:11 a.m. UTC | #3
On Mon, Apr 03, 2023 at 02:05 PM -07, John Fastabend wrote:
> Jakub Sitnicki wrote:

[...]

>> tcp_bpf_recvmsg needs a similar fix, no?
>
> Yes, I had lumped it in with follow up fixes needed for the
> stream parser case but your right its not related.
>
> Mind if I do it in a follow up? Or if I need to do a v4 I'll
> roll it in there.

That works. Totally your call.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index cf26d65ca389..3a0f43f3afd8 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -174,6 +174,24 @@  static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock,
 	return ret;
 }
 
+static bool is_next_msg_fin(struct sk_psock *psock)
+{
+	struct scatterlist *sge;
+	struct sk_msg *msg_rx;
+	int i;
+
+	msg_rx = sk_psock_peek_msg(psock);
+	i = msg_rx->sg.start;
+	sge = sk_msg_elem(msg_rx, i);
+	if (!sge->length) {
+		struct sk_buff *skb = msg_rx->skb;
+
+		if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+			return true;
+	}
+	return false;
+}
+
 static int tcp_bpf_recvmsg_parser(struct sock *sk,
 				  struct msghdr *msg,
 				  size_t len,
@@ -193,6 +211,19 @@  static int tcp_bpf_recvmsg_parser(struct sock *sk,
 	lock_sock(sk);
 msg_bytes_ready:
 	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+	/* The typical case for EFAULT is the socket was gracefully
+	 * shutdown with a FIN pkt. So check here the other case is
+	 * some error on copy_page_to_iter which would be unexpected.
+	 * On fin return correct return code to zero.
+	 */
+	if (copied == -EFAULT) {
+		bool is_fin = is_next_msg_fin(psock);
+
+		if (is_fin) {
+			copied = 0;
+			goto out;
+		}
+	}
 	if (!copied) {
 		long timeo;
 		int data;