diff mbox series

[net-next,v1] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()

Message ID 20211024235903.371430-1-jmaxwell37@gmail.com (mailing list archive)
State Accepted
Commit cf12e6f9124629b18a6182deefc0315f0a73a199
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v1] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Fixes tag looks correct
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Jonathan Maxwell Oct. 24, 2021, 11:59 p.m. UTC
v1: Implement a more general statement as recommended by Eric Dumazet. The 
sequence number will be advanced, so this check will fix the FIN case and 
other cases. 

A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that 
the write_queue was not empty as determined by tcp_write_queue_empty() but the 
sk_buff containing the FIN flag had been freed and the socket was zombied in 
that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.

Some instrumentation was added to the kernel and it was found that there is a 
timing window where tcp_sendmsg() can run after tcp_send_fin().

tcp_sendmsg() will hit an error, for example:

1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
1270 ▹       ▹       goto do_error;↩

tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.

If the other side sends a FIN packet the socket will transition to CLOSING and
remain that way until the system is rebooted.

Fix this by checking for the FIN flag in the sk_buff and don't free it if that 
is the case. Testing confirmed that fixed the issue.

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reported-by: Monir Zouaoui <Monir.Zouaoui@mail.schwarz>
Reported-by: Simon Stier <simon.stier@mail.schwarz>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet Oct. 25, 2021, 4:13 p.m. UTC | #1
On Sun, Oct 24, 2021 at 4:59 PM Jon Maxwell <jmaxwell37@gmail.com> wrote:
>
> v1: Implement a more general statement as recommended by Eric Dumazet. The
> sequence number will be advanced, so this check will fix the FIN case and
> other cases.
>
> A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> the write_queue was not empty as determined by tcp_write_queue_empty() but the
> sk_buff containing the FIN flag had been freed and the socket was zombied in
> that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
>
> Some instrumentation was added to the kernel and it was found that there is a
> timing window where tcp_sendmsg() can run after tcp_send_fin().
>
> tcp_sendmsg() will hit an error, for example:
>
> 1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
> 1270 ▹       ▹       goto do_error;↩
>
> tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
> TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.
>
> If the other side sends a FIN packet the socket will transition to CLOSING and
> remain that way until the system is rebooted.
>
> Fix this by checking for the FIN flag in the sk_buff and don't free it if that
> is the case. Testing confirmed that fixed the issue.
>
> Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> Reported-by: Monir Zouaoui <Monir.Zouaoui@mail.schwarz>
> Reported-by: Simon Stier <simon.stier@mail.schwarz>

Thanks for this fix !
Reviewed-by: Eric Dumazet <edumazet@google.com>
patchwork-bot+netdevbpf@kernel.org Oct. 26, 2021, 12:20 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Mon, 25 Oct 2021 10:59:03 +1100 you wrote:
> v1: Implement a more general statement as recommended by Eric Dumazet. The
> sequence number will be advanced, so this check will fix the FIN case and
> other cases.
> 
> A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> the write_queue was not empty as determined by tcp_write_queue_empty() but the
> sk_buff containing the FIN flag had been freed and the socket was zombied in
> that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
> 
> [...]

Here is the summary with links:
  - [net-next,v1] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
    https://git.kernel.org/netdev/net-next/c/cf12e6f91246

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c2d9830136d2..56ff7c746f88 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -938,7 +938,7 @@  int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
  */
 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
-	if (skb && !skb->len) {
+	if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) {
 		tcp_unlink_write_queue(skb, sk);
 		if (tcp_write_queue_empty(sk))
 			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);