Message ID | 20230327175446.98151-5-john.fastabend@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf sockmap fixes | expand |
On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote: > The sockmap code is returning EAGAIN after a FIN packet is received and no > more data is on the receive queue. Correct behavior is to return 0 to the > user and the user can then close the socket. The EAGAIN causes many apps > to retry which masks the problem. Eventually the socket is evicted from > the sockmap because its released from sockmap sock free handling. The > issue creates a delay and can cause some errors on application side. > > To fix this check on sk_msg_recvmsg side if length is zero and FIN flag > is set then set return to zero. A selftest will be added to check this > condition. > > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") > Tested-by: William Findlay <will@isovalent.com> > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > --- > net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > index cf26d65ca389..3a0f43f3afd8 100644 > --- a/net/ipv4/tcp_bpf.c > +++ b/net/ipv4/tcp_bpf.c [...] > @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, > lock_sock(sk); > msg_bytes_ready: > copied = sk_msg_recvmsg(sk, psock, msg, len, flags); > + /* The typical case for EFAULT is the socket was gracefully > + * shutdown with a FIN pkt. So check here the other case is > + * some error on copy_page_to_iter which would be unexpected. > + * On fin return correct return code to zero. > + */ > + if (copied == -EFAULT) { > + bool is_fin = is_next_msg_fin(psock); > + > + if (is_fin) { > + copied = 0; > + goto out; > + } > + } > if (!copied) { > long timeo; > int data; tcp_bpf_recvmsg needs a similar fix, no?
Jakub Sitnicki wrote: > On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote: > > The sockmap code is returning EAGAIN after a FIN packet is received and no > > more data is on the receive queue. Correct behavior is to return 0 to the > > user and the user can then close the socket. The EAGAIN causes many apps > > to retry which masks the problem. Eventually the socket is evicted from > > the sockmap because its released from sockmap sock free handling. The > > issue creates a delay and can cause some errors on application side. > > > > To fix this check on sk_msg_recvmsg side if length is zero and FIN flag > > is set then set return to zero. A selftest will be added to check this > > condition. > > > > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") > > Tested-by: William Findlay <will@isovalent.com> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > > --- > > net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++ > > 1 file changed, 31 insertions(+) > > > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > > index cf26d65ca389..3a0f43f3afd8 100644 > > --- a/net/ipv4/tcp_bpf.c > > +++ b/net/ipv4/tcp_bpf.c > > [...] > > > @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, > > lock_sock(sk); > > msg_bytes_ready: > > copied = sk_msg_recvmsg(sk, psock, msg, len, flags); > > + /* The typical case for EFAULT is the socket was gracefully > > + * shutdown with a FIN pkt. So check here the other case is > > + * some error on copy_page_to_iter which would be unexpected. > > + * On fin return correct return code to zero. > > + */ > > + if (copied == -EFAULT) { > > + bool is_fin = is_next_msg_fin(psock); > > + > > + if (is_fin) { > > + copied = 0; > > + goto out; > > + } > > + } > > if (!copied) { > > long timeo; > > int data; > > tcp_bpf_recvmsg needs a similar fix, no? Yes, I had lumped it in with follow up fixes needed for the stream parser case but your right its not related. Mind if I do it in a follow up? Or if I need to do a v4 I'll roll it in there. Thanks! John
On Mon, Apr 03, 2023 at 02:05 PM -07, John Fastabend wrote: > Jakub Sitnicki wrote: [...] >> tcp_bpf_recvmsg needs a similar fix, no? > > Yes, I had lumped it in with follow up fixes needed for the > stream parser case but your right its not related. > > Mind if I do it in a follow up? Or if I need to do a v4 I'll > roll it in there. That works. Totally your call.
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index cf26d65ca389..3a0f43f3afd8 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -174,6 +174,24 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, return ret; } +static bool is_next_msg_fin(struct sk_psock *psock) +{ + struct scatterlist *sge; + struct sk_msg *msg_rx; + int i; + + msg_rx = sk_psock_peek_msg(psock); + i = msg_rx->sg.start; + sge = sk_msg_elem(msg_rx, i); + if (!sge->length) { + struct sk_buff *skb = msg_rx->skb; + + if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + return true; + } + return false; +} + static int tcp_bpf_recvmsg_parser(struct sock *sk, struct msghdr *msg, size_t len, @@ -193,6 +211,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, lock_sock(sk); msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + /* The typical case for EFAULT is the socket was gracefully + * shutdown with a FIN pkt. So check here the other case is + * some error on copy_page_to_iter which would be unexpected. + * On fin return correct return code to zero. + */ + if (copied == -EFAULT) { + bool is_fin = is_next_msg_fin(psock); + + if (is_fin) { + copied = 0; + goto out; + } + } if (!copied) { long timeo; int data;