[net] tcp: fix forever orphan socket caused by tcp_abort

Message ID	20240801111611.84743-1-kuro@kuroa.me (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from pv50p00im-ztdg10011201.me.com (pv50p00im-ztdg10011201.me.com [17.58.6.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C52648F70 for <netdev@vger.kernel.org>; Thu, 1 Aug 2024 11:17:43 +0000 (UTC) From: Xueming Feng <kuro@kuroa.me> To: "David S . Miller" <davem@davemloft.net>, netdev@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com>, Neal Cardwell <ncardwell@google.com>, Yuchung Cheng <ycheng@google.com>, Soheil Hassas Yeganeh <soheil@google.com>, David Ahern <dsahern@kernel.org>, linux-kernel@vger.kernel.org, Xueming Feng <kuro@kuroa.me> Subject: [PATCH net] tcp: fix forever orphan socket caused by tcp_abort Date: Thu, 1 Aug 2024 19:16:11 +0800 Message-Id: <20240801111611.84743-1-kuro@kuroa.me> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[net] tcp: fix forever orphan socket caused by tcp_abort \| expand [net] tcp: fix forever orphan socket caused by tcp_abort

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 42 this patch: 42
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	2 maintainers not CCed: kuba@kernel.org pabeni@redhat.com
netdev/build_clang	success	Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 43 this patch: 43
netdev/checkpatch	warning	CHECK: Alignment should match open parenthesis
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 1 this patch: 1
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-08-05--12-00 (tests: 706)

Xueming Feng Aug. 1, 2024, 11:16 a.m. UTC

We have some problem closing zero-window fin-wait-1 tcp sockets in our 
environment. This patch come from the investigation.

Previously tcp_abort only sends out reset and calls tcp_done when the 
socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only 
purging the write queue, but not close the socket and left it to the 
timer.

While purging the write queue, tp->packets_out and sk->sk_write_queue 
is cleared along the way. However tcp_retransmit_timer have early 
return based on !tp->packets_out and tcp_probe_timer have early 
return based on !sk->sk_write_queue.

This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched 
and socket not being killed by the timers. Converting a zero-windowed
orphan to a forever orphan.

This patch removes the SOCK_DEAD check in tcp_abort, making it send 
reset to peer and close the socket accordingly. Preventing the 
timer-less orphan from happening.

Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
Signed-off-by: Xueming Feng <kuro@kuroa.me>
---
 net/ipv4/tcp.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Eric Dumazet Aug. 1, 2024, 1:11 p.m. UTC | #1

On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@kuroa.me> wrote:
>
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
>
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
>
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
>
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers. Converting a zero-windowed
> orphan to a forever orphan.
>
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
>
> Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> Signed-off-by: Xueming Feng <kuro@kuroa.me>

This seems legit, but are you sure these two blamed commits added this bug ?

Even before them, we should have called tcp_done() right away, instead
of waiting for a (possibly long) timer to complete the job.

This might be important when killing millions of sockets on a busy server.

CC Lorenzo

Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?

Thanks.

Jason Xing Aug. 3, 2024, 3:48 p.m. UTC | #2

Hello Eric,

On Thu, Aug 1, 2024 at 9:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@kuroa.me> wrote:
> >
> > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > environment. This patch come from the investigation.
> >
> > Previously tcp_abort only sends out reset and calls tcp_done when the
> > socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> > purging the write queue, but not close the socket and left it to the
> > timer.
> >
> > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > is cleared along the way. However tcp_retransmit_timer have early
> > return based on !tp->packets_out and tcp_probe_timer have early
> > return based on !sk->sk_write_queue.
> >
> > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > and socket not being killed by the timers. Converting a zero-windowed
> > orphan to a forever orphan.
> >
> > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > reset to peer and close the socket accordingly. Preventing the
> > timer-less orphan from happening.
> >
> > Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> > Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> > Signed-off-by: Xueming Feng <kuro@kuroa.me>
>
> This seems legit, but are you sure these two blamed commits added this bug ?
>
> Even before them, we should have called tcp_done() right away, instead
> of waiting for a (possibly long) timer to complete the job.
>
> This might be important when killing millions of sockets on a busy server.
>
> CC Lorenzo
>
> Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?

I guess that one of possible reasons is to avoid double-free,
something like this, happening in inet_csk_destroy_sock().

Let me assume: if we call tcp_close() first under the memory pressure
which means tcp_check_oom() returns true and then it will call
inet_csk_destroy_sock() in __tcp_close(), later tcp_abort() will call
tcp_done() to free the sk again in the inet_csk_destroy_sock() when
not testing the SOCK_DEAD flag in tcp_abort.

Do you think the above case could happen?

Thanks,
Jason

Jason Xing Aug. 5, 2024, 4:52 a.m. UTC | #3

On Sat, Aug 3, 2024 at 11:48 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Thu, Aug 1, 2024 at 9:17 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@kuroa.me> wrote:
> > >
> > > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > > environment. This patch come from the investigation.
> > >
> > > Previously tcp_abort only sends out reset and calls tcp_done when the
> > > socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> > > purging the write queue, but not close the socket and left it to the
> > > timer.
> > >
> > > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > > is cleared along the way. However tcp_retransmit_timer have early
> > > return based on !tp->packets_out and tcp_probe_timer have early
> > > return based on !sk->sk_write_queue.
> > >
> > > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > > and socket not being killed by the timers. Converting a zero-windowed
> > > orphan to a forever orphan.
> > >
> > > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > > reset to peer and close the socket accordingly. Preventing the
> > > timer-less orphan from happening.
> > >
> > > Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> > > Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> > > Signed-off-by: Xueming Feng <kuro@kuroa.me>
> >
> > This seems legit, but are you sure these two blamed commits added this bug ?
> >
> > Even before them, we should have called tcp_done() right away, instead
> > of waiting for a (possibly long) timer to complete the job.
> >
> > This might be important when killing millions of sockets on a busy server.
> >
> > CC Lorenzo
> >
> > Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
>
> I guess that one of possible reasons is to avoid double-free,
> something like this, happening in inet_csk_destroy_sock().
>
> Let me assume: if we call tcp_close() first under the memory pressure
> which means tcp_check_oom() returns true and then it will call
> inet_csk_destroy_sock() in __tcp_close(), later tcp_abort() will call
> tcp_done() to free the sk again in the inet_csk_destroy_sock() when
> not testing the SOCK_DEAD flag in tcp_abort.
>

How about this one which can prevent double calling
inet_csk_destroy_sock() when we call destroy and close nearly at the
same time under that circumstance:

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e03a342c9162..d5d3b21cc824 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4646,7 +4646,7 @@ int tcp_abort(struct sock *sk, int err)
        local_bh_disable();
        bh_lock_sock(sk);

-       if (!sock_flag(sk, SOCK_DEAD)) {
+       if (sk->sk_state != TCP_CLOSE) {
                if (tcp_need_reset(sk->sk_state))
                        tcp_send_active_reset(sk, GFP_ATOMIC,
                                              SK_RST_REASON_NOT_SPECIFIED);

Each time we call inet_csk_destroy_sock(), we must make sure we've
already set the state to TCP_CLOSE. Based on this, I think we can use
this as an indicator to avoid calling twice to destroy the socket.

Thanks,
Jason

Lorenzo Colitti Aug. 5, 2024, 6:42 a.m. UTC | #4

On Thu, Aug 1, 2024 at 10:11 PM Eric Dumazet <edumazet@google.com> wrote:
> > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > reset to peer and close the socket accordingly. Preventing the
> > timer-less orphan from happening.
> > [...]
>
> This seems legit, but are you sure these two blamed commits added this bug ?
>
> Even before them, we should have called tcp_done() right away, instead
> of waiting for a (possibly long) timer to complete the job.
>
> This might be important when killing millions of sockets on a busy server.
>
> CC Lorenzo
>
> Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?

I think I took it from the original tcp_nuke_addr implementation that
Android used before SOCK_DESTROY and tcp_abort were written. The
oldest reference I could find to that code is this commit that went
into 2.6.39 (!), which already had that check.

https://android.googlesource.com/kernel/common/+/06611218f86dc353d5dd0cb5acac32a0863a2ae5

I expect the check was intended to prevent force-closing the same socket twice.

Eric Dumazet Aug. 5, 2024, 7:22 a.m. UTC | #5

On Mon, Aug 5, 2024 at 6:52 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sat, Aug 3, 2024 at 11:48 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Thu, Aug 1, 2024 at 9:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@kuroa.me> wrote:
> > > >
> > > > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > > > environment. This patch come from the investigation.
> > > >
> > > > Previously tcp_abort only sends out reset and calls tcp_done when the
> > > > socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> > > > purging the write queue, but not close the socket and left it to the
> > > > timer.
> > > >
> > > > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > > > is cleared along the way. However tcp_retransmit_timer have early
> > > > return based on !tp->packets_out and tcp_probe_timer have early
> > > > return based on !sk->sk_write_queue.
> > > >
> > > > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > > > and socket not being killed by the timers. Converting a zero-windowed
> > > > orphan to a forever orphan.
> > > >
> > > > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > > > reset to peer and close the socket accordingly. Preventing the
> > > > timer-less orphan from happening.
> > > >
> > > > Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> > > > Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> > > > Signed-off-by: Xueming Feng <kuro@kuroa.me>
> > >
> > > This seems legit, but are you sure these two blamed commits added this bug ?
> > >
> > > Even before them, we should have called tcp_done() right away, instead
> > > of waiting for a (possibly long) timer to complete the job.
> > >
> > > This might be important when killing millions of sockets on a busy server.
> > >
> > > CC Lorenzo
> > >
> > > Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
> >
> > I guess that one of possible reasons is to avoid double-free,
> > something like this, happening in inet_csk_destroy_sock().
> >
> > Let me assume: if we call tcp_close() first under the memory pressure
> > which means tcp_check_oom() returns true and then it will call
> > inet_csk_destroy_sock() in __tcp_close(), later tcp_abort() will call
> > tcp_done() to free the sk again in the inet_csk_destroy_sock() when
> > not testing the SOCK_DEAD flag in tcp_abort.
> >
>
> How about this one which can prevent double calling
> inet_csk_destroy_sock() when we call destroy and close nearly at the
> same time under that circumstance:
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e03a342c9162..d5d3b21cc824 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4646,7 +4646,7 @@ int tcp_abort(struct sock *sk, int err)
>         local_bh_disable();
>         bh_lock_sock(sk);
>
> -       if (!sock_flag(sk, SOCK_DEAD)) {
> +       if (sk->sk_state != TCP_CLOSE) {
>                 if (tcp_need_reset(sk->sk_state))
>                         tcp_send_active_reset(sk, GFP_ATOMIC,
>                                               SK_RST_REASON_NOT_SPECIFIED);
>
> Each time we call inet_csk_destroy_sock(), we must make sure we've
> already set the state to TCP_CLOSE. Based on this, I think we can use
> this as an indicator to avoid calling twice to destroy the socket.

I do not think this will work.

With this patch, a listener socket will not get an error notification.

Ideally we need tests for this seldom used feature.

Jason Xing Aug. 5, 2024, 7:46 a.m. UTC | #6

On Mon, Aug 5, 2024 at 2:43 PM Lorenzo Colitti <lorenzo@google.com> wrote:
>
> On Thu, Aug 1, 2024 at 10:11 PM Eric Dumazet <edumazet@google.com> wrote:
> > > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > > reset to peer and close the socket accordingly. Preventing the
> > > timer-less orphan from happening.
> > > [...]
> >
> > This seems legit, but are you sure these two blamed commits added this bug ?
> >
> > Even before them, we should have called tcp_done() right away, instead
> > of waiting for a (possibly long) timer to complete the job.
> >
> > This might be important when killing millions of sockets on a busy server.
> >
> > CC Lorenzo
> >
> > Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
>
> I think I took it from the original tcp_nuke_addr implementation that
> Android used before SOCK_DESTROY and tcp_abort were written. The
> oldest reference I could find to that code is this commit that went
> into 2.6.39 (!), which already had that check.
>
> https://android.googlesource.com/kernel/common/+/06611218f86dc353d5dd0cb5acac32a0863a2ae5
>
> I expect the check was intended to prevent force-closing the same socket twice.
>

Yes, I guess so.

Jason Xing Aug. 5, 2024, 8:04 a.m. UTC | #7

On Mon, Aug 5, 2024 at 3:23 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 5, 2024 at 6:52 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Sat, Aug 3, 2024 at 11:48 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Thu, Aug 1, 2024 at 9:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@kuroa.me> wrote:
> > > > >
> > > > > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > > > > environment. This patch come from the investigation.
> > > > >
> > > > > Previously tcp_abort only sends out reset and calls tcp_done when the
> > > > > socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> > > > > purging the write queue, but not close the socket and left it to the
> > > > > timer.
> > > > >
> > > > > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > > > > is cleared along the way. However tcp_retransmit_timer have early
> > > > > return based on !tp->packets_out and tcp_probe_timer have early
> > > > > return based on !sk->sk_write_queue.
> > > > >
> > > > > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > > > > and socket not being killed by the timers. Converting a zero-windowed
> > > > > orphan to a forever orphan.
> > > > >
> > > > > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > > > > reset to peer and close the socket accordingly. Preventing the
> > > > > timer-less orphan from happening.
> > > > >
> > > > > Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> > > > > Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> > > > > Signed-off-by: Xueming Feng <kuro@kuroa.me>
> > > >
> > > > This seems legit, but are you sure these two blamed commits added this bug ?
> > > >
> > > > Even before them, we should have called tcp_done() right away, instead
> > > > of waiting for a (possibly long) timer to complete the job.
> > > >
> > > > This might be important when killing millions of sockets on a busy server.
> > > >
> > > > CC Lorenzo
> > > >
> > > > Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
> > >
> > > I guess that one of possible reasons is to avoid double-free,
> > > something like this, happening in inet_csk_destroy_sock().
> > >
> > > Let me assume: if we call tcp_close() first under the memory pressure
> > > which means tcp_check_oom() returns true and then it will call
> > > inet_csk_destroy_sock() in __tcp_close(), later tcp_abort() will call
> > > tcp_done() to free the sk again in the inet_csk_destroy_sock() when
> > > not testing the SOCK_DEAD flag in tcp_abort.
> > >
> >
> > How about this one which can prevent double calling
> > inet_csk_destroy_sock() when we call destroy and close nearly at the
> > same time under that circumstance:
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index e03a342c9162..d5d3b21cc824 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4646,7 +4646,7 @@ int tcp_abort(struct sock *sk, int err)
> >         local_bh_disable();
> >         bh_lock_sock(sk);
> >
> > -       if (!sock_flag(sk, SOCK_DEAD)) {
> > +       if (sk->sk_state != TCP_CLOSE) {
> >                 if (tcp_need_reset(sk->sk_state))
> >                         tcp_send_active_reset(sk, GFP_ATOMIC,
> >                                               SK_RST_REASON_NOT_SPECIFIED);
> >
> > Each time we call inet_csk_destroy_sock(), we must make sure we've
> > already set the state to TCP_CLOSE. Based on this, I think we can use
> > this as an indicator to avoid calling twice to destroy the socket.
>
> I do not think this will work.
>
> With this patch, a listener socket will not get an error notification.

Oh, you're right.

I think we can add this particular case in the if or if-else statement
to handle.

Thanks,
Jason

Lorenzo Colitti Aug. 14, 2024, 4:43 a.m. UTC | #8

On Mon, Aug 5, 2024 at 4:23 PM Eric Dumazet <edumazet@google.com> wrote:
> > Each time we call inet_csk_destroy_sock(), we must make sure we've
> > already set the state to TCP_CLOSE. Based on this, I think we can use
> > this as an indicator to avoid calling twice to destroy the socket.
>
> I do not think this will work.
>
> With this patch, a listener socket will not get an error notification.
>
> Ideally we need tests for this seldom used feature.

FWIW there is a fair amount of test coverage here:

https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/sock_diag_test.py

though unfortunately they don't pass on unmodified kernels (I didn't
look into why - maybe Maciej knows). I ran the tests on the "v2-ish
patch" and they all passed except for a test that expects that
SOCK_DESTROY on a FIN_WAIT1 socket does nothing. That seems OK because
it's the thing your patch is trying to fix.

Just to confirm - it's OK to send a RST on a connection that's already
in FIN_WAIT1 state? Is that allowed by the RFC?

Jason Xing Aug. 14, 2024, 4:56 a.m. UTC | #9

On Wed, Aug 14, 2024 at 12:43 PM Lorenzo Colitti <lorenzo@google.com> wrote:
>
> On Mon, Aug 5, 2024 at 4:23 PM Eric Dumazet <edumazet@google.com> wrote:
> > > Each time we call inet_csk_destroy_sock(), we must make sure we've
> > > already set the state to TCP_CLOSE. Based on this, I think we can use
> > > this as an indicator to avoid calling twice to destroy the socket.
> >
> > I do not think this will work.
> >
> > With this patch, a listener socket will not get an error notification.
> >
> > Ideally we need tests for this seldom used feature.
>
> FWIW there is a fair amount of test coverage here:
>
> https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/sock_diag_test.py
>
> though unfortunately they don't pass on unmodified kernels (I didn't
> look into why - maybe Maciej knows). I ran the tests on the "v2-ish
> patch" and they all passed except for a test that expects that
> SOCK_DESTROY on a FIN_WAIT1 socket does nothing. That seems OK because
> it's the thing your patch is trying to fix.
>
> Just to confirm - it's OK to send a RST on a connection that's already
> in FIN_WAIT1 state? Is that allowed by the RFC?

I think so. Please take a look at the following link which tells us
whether we should send an RST:

ABORT Call

    ESTABLISHED STATE
    FIN-WAIT-1 STATE
    FIN-WAIT-2 STATE
    CLOSE-WAIT STATE

      Send a reset segment:

        <SEQ=SND.NXT><CTL=RST>

      All queued SENDs and RECEIVEs should be given "connection reset"
      notification; all segments queued for transmission (except for the
      RST formed above) or retransmission should be flushed, delete the
      TCB, enter CLOSED state, and return.

https://www.ietf.org/rfc/rfc793.txt#:~:text=Specification%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20ABORT%20Call-,ABORT%20Call,-CLOSED%20STATE%20(i

Thanks,
Jason

Lorenzo Colitti Aug. 14, 2024, 7:48 a.m. UTC | #10

On Wed, Aug 14, 2024 at 1:43 PM Lorenzo Colitti <lorenzo@google.com> wrote:
> though unfortunately they don't pass on unmodified kernels (I didn't
> look into why - maybe Maciej knows).

Actually, they do: just git clone
https://android.googlesource.com/kernel/tests and from the kernel tree
do path/to/net/test/run_net_test.sh sock_diag_test.py

[net] tcp: fix forever orphan socket caused by tcp_abort

Checks

Commit Message

Comments

Patch