diff mbox series

[net-next] tcp: do not allow to connect with the four-tuple symmetry socket

Message ID 20240818042538.40195-1-kerneljasonxing@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] tcp: do not allow to connect with the four-tuple symmetry socket | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch warning CHECK: Macro argument 'port' may be better as '(port)' to avoid precedence issues
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 1
netdev/contest fail net-next-2024-08-18--06-00 (tests: 710)

Commit Message

Jason Xing Aug. 18, 2024, 4:25 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

Four-tuple symmetry here means the socket has the same remote/local
port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
$ ss -nat | grep 8000
ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000

Before this patch, one client could start a connection successfully
as above even without a listener, which means, the socket connects
to its self. Then every time other threads trying to bind/listen on
this port will encounter a failure surely, unless the thread owning
the socket exits.

It can rarely happen on the loopback device when the connect() finds
the same port as its remote port while listener is not running. It
has the side-effect on other threads. Besides, this solo flow has no
merit, no significance at all.

After this patch, the moment we try to connect with a 4-tuple symmetry
socket, we will get an error "connect: Cannot assign requested address".

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/inet_hashtables.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

Comments

Jason Xing Aug. 18, 2024, 5:16 a.m. UTC | #1
On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> Four-tuple symmetry here means the socket has the same remote/local
> port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> $ ss -nat | grep 8000
> ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
>
> Before this patch, one client could start a connection successfully
> as above even without a listener, which means, the socket connects
> to its self. Then every time other threads trying to bind/listen on
> this port will encounter a failure surely, unless the thread owning
> the socket exits.
>
> It can rarely happen on the loopback device when the connect() finds
> the same port as its remote port while listener is not running. It
> has the side-effect on other threads. Besides, this solo flow has no
> merit, no significance at all.
>
> After this patch, the moment we try to connect with a 4-tuple symmetry
> socket, we will get an error "connect: Cannot assign requested address".
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/ipv4/inet_hashtables.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index 9bfcfd016e18..2f8f34ee62fb 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -978,6 +978,21 @@ void inet_bhash2_reset_saddr(struct sock *sk)
>  }
>  EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr);
>
> +/* SYMMETRY means the socket has the same local and remote port/ipaddr */
> +#define INET_ADDR_SYMMETRY(sk) (inet_sk(sk)->inet_rcv_saddr == \
> +                               inet_sk(sk)->inet_daddr)
> +#define INET_PORT_SYMMETRY(sk) (inet_sk(sk)->inet_num == \
> +                               ntohs(inet_sk(sk)->inet_dport))
> +#define INET_PORT_SYMMETRY_MATCH(sk, port) (port == \
> +                                           ntohs(inet_sk(sk)->inet_dport))
> +static inline int inet_tuple_symmetry(struct sock *sk)

Patchwork shows to me that I should not use inline in the .c file. I
will change it in due course in v2 patch.
Jason Xing Aug. 18, 2024, 1:50 p.m. UTC | #2
On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Four-tuple symmetry here means the socket has the same remote/local
> > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > $ ss -nat | grep 8000
> > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000

Thanks to the failed tests appearing in patchwork, now I'm aware of
the technical term called "self-connection" in English to describe
this case. I will update accordingly the title, body messages,
function name by introducing "self-connection" words like this in the
next submission.

Following this clue, I saw many reports happening in these years, like
[1][2]. Users are often astonished about this phenomenon and lost and
have to find various ways to workaround it. Since, in my opinion, the
self-connection doesn't have any advantage and usefulness, why not
avoid it in the kernel? Could networking experts enlighten me? Thanks.

+ Dmitry
Hello Dmitry, do you know why the self-connect_ipv4/6 was introduced
in the selftests which the patch I wrote failed? Thanks.

[1]: https://adil.medium.com/what-is-tcp-self-connect-issue-be7d7b5f9f59
[2]: https://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention

Thanks,
Jason
Kuniyuki Iwashima Aug. 18, 2024, 6:48 p.m. UTC | #3
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Sun, 18 Aug 2024 21:50:51 +0800
> On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Four-tuple symmetry here means the socket has the same remote/local
> > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > $ ss -nat | grep 8000
> > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> 
> Thanks to the failed tests appearing in patchwork, now I'm aware of
> the technical term called "self-connection" in English to describe
> this case. I will update accordingly the title, body messages,
> function name by introducing "self-connection" words like this in the
> next submission.
> 
> Following this clue, I saw many reports happening in these years, like
> [1][2]. Users are often astonished about this phenomenon and lost and
> have to find various ways to workaround it. Since, in my opinion, the
> self-connection doesn't have any advantage and usefulness,

It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
in tcp_rcv_synsent_state_process().

  https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7

So you can't remove self-connect functionality, the recent main user is
syzkaller though.
Jason Xing Aug. 18, 2024, 11:48 p.m. UTC | #4
Hello Kuniyuki,

On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Sun, 18 Aug 2024 21:50:51 +0800
> > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > $ ss -nat | grep 8000
> > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> >
> > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > the technical term called "self-connection" in English to describe
> > this case. I will update accordingly the title, body messages,
> > function name by introducing "self-connection" words like this in the
> > next submission.
> >
> > Following this clue, I saw many reports happening in these years, like
> > [1][2]. Users are often astonished about this phenomenon and lost and
> > have to find various ways to workaround it. Since, in my opinion, the
> > self-connection doesn't have any advantage and usefulness,
>
> It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> in tcp_rcv_synsent_state_process().
>
>   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7

Yes, I noticed this one: self-connection is one particular case among
simultaneously open cases. Honestly, it's really strange that client
and server uses a single socket.

>
> So you can't remove self-connect functionality, the recent main user is
> syzkaller though.

Ah, thanks for reminding me. It seems that I have to drop this patch
and there is no good way to resolve the issue in the kernel.

Thanks,
Jason
Jason Xing Aug. 19, 2024, 12:26 a.m. UTC | #5
On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Kuniyuki,
>
> On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Jason Xing <kerneljasonxing@gmail.com>
> > Date: Sun, 18 Aug 2024 21:50:51 +0800
> > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > > $ ss -nat | grep 8000
> > > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> > >
> > > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > > the technical term called "self-connection" in English to describe
> > > this case. I will update accordingly the title, body messages,
> > > function name by introducing "self-connection" words like this in the
> > > next submission.
> > >
> > > Following this clue, I saw many reports happening in these years, like
> > > [1][2]. Users are often astonished about this phenomenon and lost and
> > > have to find various ways to workaround it. Since, in my opinion, the
> > > self-connection doesn't have any advantage and usefulness,
> >
> > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> > path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> > in tcp_rcv_synsent_state_process().
> >
> >   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7
>
> Yes, I noticed this one: self-connection is one particular case among
> simultaneously open cases. Honestly, it's really strange that client
> and server uses a single socket.
>
> >
> > So you can't remove self-connect functionality, the recent main user is
> > syzkaller though.
>
> Ah, thanks for reminding me. It seems that I have to drop this patch
> and there is no good way to resolve the issue in the kernel.
>

Can we introduce one sysctl knob to control it since we can tell there
are many user reports/complaints through the internet? Default setting
of the new knob is to allow users to connect to itself like right now,
not interfering with many years of habits, like what the test tools
currently use.

Can I give it a shot?

Thanks,
Jason
Eric Dumazet Aug. 19, 2024, 7:30 a.m. UTC | #6
On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Kuniyuki,
> >
> > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > Date: Sun, 18 Aug 2024 21:50:51 +0800
> > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > >
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > > > $ ss -nat | grep 8000
> > > > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> > > >
> > > > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > > > the technical term called "self-connection" in English to describe
> > > > this case. I will update accordingly the title, body messages,
> > > > function name by introducing "self-connection" words like this in the
> > > > next submission.
> > > >
> > > > Following this clue, I saw many reports happening in these years, like
> > > > [1][2]. Users are often astonished about this phenomenon and lost and
> > > > have to find various ways to workaround it. Since, in my opinion, the
> > > > self-connection doesn't have any advantage and usefulness,
> > >
> > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> > > path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> > > in tcp_rcv_synsent_state_process().
> > >
> > >   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7
> >
> > Yes, I noticed this one: self-connection is one particular case among
> > simultaneously open cases. Honestly, it's really strange that client
> > and server uses a single socket.
> >
> > >
> > > So you can't remove self-connect functionality, the recent main user is
> > > syzkaller though.
> >
> > Ah, thanks for reminding me. It seems that I have to drop this patch
> > and there is no good way to resolve the issue in the kernel.
> >
>
> Can we introduce one sysctl knob to control it since we can tell there
> are many user reports/complaints through the internet? Default setting
> of the new knob is to allow users to connect to itself like right now,
> not interfering with many years of habits, like what the test tools
> currently use.
>
> Can I give it a shot?

No you can not.

netfilter can probably do this.

If it can not, it could be augmented I think.
Jason Xing Aug. 19, 2024, 9:01 a.m. UTC | #7
Hello Eric,

On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hello Kuniyuki,
> > >
> > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > Date: Sun, 18 Aug 2024 21:50:51 +0800
> > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > > >
> > > > > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > > > > $ ss -nat | grep 8000
> > > > > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> > > > >
> > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > > > > the technical term called "self-connection" in English to describe
> > > > > this case. I will update accordingly the title, body messages,
> > > > > function name by introducing "self-connection" words like this in the
> > > > > next submission.
> > > > >
> > > > > Following this clue, I saw many reports happening in these years, like
> > > > > [1][2]. Users are often astonished about this phenomenon and lost and
> > > > > have to find various ways to workaround it. Since, in my opinion, the
> > > > > self-connection doesn't have any advantage and usefulness,
> > > >
> > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> > > > path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> > > > in tcp_rcv_synsent_state_process().
> > > >
> > > >   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7
> > >
> > > Yes, I noticed this one: self-connection is one particular case among
> > > simultaneously open cases. Honestly, it's really strange that client
> > > and server uses a single socket.
> > >
> > > >
> > > > So you can't remove self-connect functionality, the recent main user is
> > > > syzkaller though.
> > >
> > > Ah, thanks for reminding me. It seems that I have to drop this patch
> > > and there is no good way to resolve the issue in the kernel.
> > >
> >
> > Can we introduce one sysctl knob to control it since we can tell there
> > are many user reports/complaints through the internet? Default setting
> > of the new knob is to allow users to connect to itself like right now,
> > not interfering with many years of habits, like what the test tools
> > currently use.
> >
> > Can I give it a shot?
>
> No you can not.

May I ask why? Is it because self-connection adheres to the
simultaneously open part in RFC 9293?

I feel this case is very particular, not explained well in the RFC.
Usually, we don't consider one socket to act as client and server
unless in debug or test circumstances. As you can see, some people
have encountered the issue for a long time.

>
> netfilter can probably do this.

Sure, It can do. It can be a little bit helpful, but clumsy. We have
to set specific rules for each possible listener and then drop those
SYN packets if they carry the same remote and local port/ip.

Thanks,
Jason
Eric Dumazet Aug. 19, 2024, 9:18 a.m. UTC | #8
On Mon, Aug 19, 2024 at 11:02 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > Hello Kuniyuki,
> > > >
> > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > > Date: Sun, 18 Aug 2024 21:50:51 +0800
> > > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > > > >
> > > > > > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > > > > > $ ss -nat | grep 8000
> > > > > > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> > > > > >
> > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > > > > > the technical term called "self-connection" in English to describe
> > > > > > this case. I will update accordingly the title, body messages,
> > > > > > function name by introducing "self-connection" words like this in the
> > > > > > next submission.
> > > > > >
> > > > > > Following this clue, I saw many reports happening in these years, like
> > > > > > [1][2]. Users are often astonished about this phenomenon and lost and
> > > > > > have to find various ways to workaround it. Since, in my opinion, the
> > > > > > self-connection doesn't have any advantage and usefulness,
> > > > >
> > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> > > > > path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> > > > > in tcp_rcv_synsent_state_process().
> > > > >
> > > > >   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7
> > > >
> > > > Yes, I noticed this one: self-connection is one particular case among
> > > > simultaneously open cases. Honestly, it's really strange that client
> > > > and server uses a single socket.
> > > >
> > > > >
> > > > > So you can't remove self-connect functionality, the recent main user is
> > > > > syzkaller though.
> > > >
> > > > Ah, thanks for reminding me. It seems that I have to drop this patch
> > > > and there is no good way to resolve the issue in the kernel.
> > > >
> > >
> > > Can we introduce one sysctl knob to control it since we can tell there
> > > are many user reports/complaints through the internet? Default setting
> > > of the new knob is to allow users to connect to itself like right now,
> > > not interfering with many years of habits, like what the test tools
> > > currently use.
> > >
> > > Can I give it a shot?
> >
> > No you can not.
>
> May I ask why? Is it because self-connection adheres to the
> simultaneously open part in RFC 9293?

This will break some user programs, obviously.

I will ask you the opposite : What RFC prevents the current situation ?

>
> I feel this case is very particular, not explained well in the RFC.
> Usually, we don't consider one socket to act as client and server
> unless in debug or test circumstances. As you can see, some people
> have encountered the issue for a long time.

Can you provide links to the issues ? How can a programmer hit this
using standard and documented ways (passive, active flows) ?

>
> >
> > netfilter can probably do this.
>
> Sure, It can do. It can be a little bit helpful, but clumsy. We have
> to set specific rules for each possible listener and then drop those
> SYN packets if they carry the same remote and local port/ip.
>
> Thanks,
> Jason
Jason Xing Aug. 19, 2024, 9:31 a.m. UTC | #9
On Mon, Aug 19, 2024 at 5:18 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 19, 2024 at 11:02 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > Hello Kuniyuki,
> > > > >
> > > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > >
> > > > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > Date: Sun, 18 Aug 2024 21:50:51 +0800
> > > > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > > > > >
> > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local
> > > > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000.
> > > > > > > > > $ ss -nat | grep 8000
> > > > > > > > > ESTAB      0      0          127.0.0.1:8000       127.0.0.1:8000
> > > > > > >
> > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of
> > > > > > > the technical term called "self-connection" in English to describe
> > > > > > > this case. I will update accordingly the title, body messages,
> > > > > > > function name by introducing "self-connection" words like this in the
> > > > > > > next submission.
> > > > > > >
> > > > > > > Following this clue, I saw many reports happening in these years, like
> > > > > > > [1][2]. Users are often astonished about this phenomenon and lost and
> > > > > > > have to find various ways to workaround it. Since, in my opinion, the
> > > > > > > self-connection doesn't have any advantage and usefulness,
> > > > > >
> > > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV)
> > > > > > path as you see in TCP-AO tests.  See RFC 9293 and the (!ack && syn) case
> > > > > > in tcp_rcv_synsent_state_process().
> > > > > >
> > > > > >   https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7
> > > > >
> > > > > Yes, I noticed this one: self-connection is one particular case among
> > > > > simultaneously open cases. Honestly, it's really strange that client
> > > > > and server uses a single socket.
> > > > >
> > > > > >
> > > > > > So you can't remove self-connect functionality, the recent main user is
> > > > > > syzkaller though.
> > > > >
> > > > > Ah, thanks for reminding me. It seems that I have to drop this patch
> > > > > and there is no good way to resolve the issue in the kernel.
> > > > >
> > > >
> > > > Can we introduce one sysctl knob to control it since we can tell there
> > > > are many user reports/complaints through the internet? Default setting
> > > > of the new knob is to allow users to connect to itself like right now,
> > > > not interfering with many years of habits, like what the test tools
> > > > currently use.
> > > >
> > > > Can I give it a shot?
> > >
> > > No you can not.
> >
> > May I ask why? Is it because self-connection adheres to the
> > simultaneously open part in RFC 9293?
>
> This will break some user programs, obviously.

I agree. It'a headache.

>
> I will ask you the opposite : What RFC prevents the current situation ?

Not really, actually. The reason why I wrote the patch is because it
indeed happened.

>
> >
> > I feel this case is very particular, not explained well in the RFC.
> > Usually, we don't consider one socket to act as client and server
> > unless in debug or test circumstances. As you can see, some people
> > have encountered the issue for a long time.
>
> Can you provide links to the issues ? How can a programmer hit this
> using standard and documented ways (passive, active flows) ?
>

I've listed some of them in the previous discussion through googling
"tcp self connection" or something like that. Let me copy here:
[1]: https://adil.medium.com/what-is-tcp-self-connect-issue-be7d7b5f9f59
[2]: https://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention

After investigating such an issue more deeply in the customers'
machines, the main reason why it can happen is the listener exits
while another thread starts to connect, which can cause
self-connection, even though the chance is slim. Later, the listener
tries to listen and for sure it will fail due to that single
self-connection.

Thanks,
Jason
Eric Dumazet Aug. 19, 2024, 9:38 a.m. UTC | #10
On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote:

> After investigating such an issue more deeply in the customers'
> machines, the main reason why it can happen is the listener exits
> while another thread starts to connect, which can cause
> self-connection, even though the chance is slim. Later, the listener
> tries to listen and for sure it will fail due to that single
> self-connection.

This would happen if the range of ephemeral ports include the listening port,
which is discouraged.

ip_local_reserved_ports is supposed to help.

This looks like a security issue to me, and netfilter can handle it.
Jason Xing Aug. 19, 2024, 9:41 a.m. UTC | #11
On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> > After investigating such an issue more deeply in the customers'
> > machines, the main reason why it can happen is the listener exits
> > while another thread starts to connect, which can cause
> > self-connection, even though the chance is slim. Later, the listener
> > tries to listen and for sure it will fail due to that single
> > self-connection.
>
> This would happen if the range of ephemeral ports include the listening port,
> which is discouraged.

Yes.

>
> ip_local_reserved_ports is supposed to help.

Sure, I workarounded it by using this and it worked.

>
> This looks like a security issue to me, and netfilter can handle it.

I have to admit setting netfilter rules for each flow is not a very
user-friendly way.

Anyway, really thanks for your suggestion :)

Thanks,
Jason
Kuniyuki Iwashima Aug. 19, 2024, 5:56 p.m. UTC | #12
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Mon, 19 Aug 2024 17:41:32 +0800
> On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > > After investigating such an issue more deeply in the customers'
> > > machines, the main reason why it can happen is the listener exits
> > > while another thread starts to connect, which can cause
> > > self-connection, even though the chance is slim. Later, the listener
> > > tries to listen and for sure it will fail due to that single
> > > self-connection.
> >
> > This would happen if the range of ephemeral ports include the listening port,
> > which is discouraged.
> 
> Yes.
> 
> >
> > ip_local_reserved_ports is supposed to help.
> 
> Sure, I workarounded it by using this and it worked.
> 
> >
> > This looks like a security issue to me, and netfilter can handle it.
> 
> I have to admit setting netfilter rules for each flow is not a very
> user-friendly way.

I think even netfilter is not needed.

It sounds like the server application needs to implement graceful shutdown.
The server should not release the port if there are on-going clients.  The
server should spin up a new process and use the following to transfer the
remaining connections:

  * FD passing
  * SO_REUSEPORT w/ (net.ipv4.tcp_migrate_req or BPF)

Then, no client can occupy the server's port even without
ip_local_reserved_ports.

But I still recommend using ip_local_reserved_ports unless the server port
is random.
Jason Xing Aug. 20, 2024, 12:29 a.m. UTC | #13
On Tue, Aug 20, 2024 at 1:56 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Mon, 19 Aug 2024 17:41:32 +0800
> > On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > > After investigating such an issue more deeply in the customers'
> > > > machines, the main reason why it can happen is the listener exits
> > > > while another thread starts to connect, which can cause
> > > > self-connection, even though the chance is slim. Later, the listener
> > > > tries to listen and for sure it will fail due to that single
> > > > self-connection.
> > >
> > > This would happen if the range of ephemeral ports include the listening port,
> > > which is discouraged.
> >
> > Yes.
> >
> > >
> > > ip_local_reserved_ports is supposed to help.
> >
> > Sure, I workarounded it by using this and it worked.
> >
> > >
> > > This looks like a security issue to me, and netfilter can handle it.
> >
> > I have to admit setting netfilter rules for each flow is not a very
> > user-friendly way.
>
> I think even netfilter is not needed.
>
> It sounds like the server application needs to implement graceful shutdown.
> The server should not release the port if there are on-going clients.  The
> server should spin up a new process and use the following to transfer the
> remaining connections:
>
>   * FD passing
>   * SO_REUSEPORT w/ (net.ipv4.tcp_migrate_req or BPF)
>
> Then, no client can occupy the server's port even without
> ip_local_reserved_ports.
>
> But I still recommend using ip_local_reserved_ports unless the server port
> is random.

Yes, there are some ways that can mitigate or solve the issue.

The reason why I wrote the patch is because at the beginning I don't
think the self-connection feature is useful which you reminded me of
some test tools like syzbot could use. That's not what I was aware of.
Thanks for your reply.

Thanks,
Jason
diff mbox series

Patch

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9bfcfd016e18..2f8f34ee62fb 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -978,6 +978,21 @@  void inet_bhash2_reset_saddr(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr);
 
+/* SYMMETRY means the socket has the same local and remote port/ipaddr */
+#define INET_ADDR_SYMMETRY(sk) (inet_sk(sk)->inet_rcv_saddr == \
+				inet_sk(sk)->inet_daddr)
+#define INET_PORT_SYMMETRY(sk) (inet_sk(sk)->inet_num == \
+				ntohs(inet_sk(sk)->inet_dport))
+#define INET_PORT_SYMMETRY_MATCH(sk, port) (port == \
+					    ntohs(inet_sk(sk)->inet_dport))
+static inline int inet_tuple_symmetry(struct sock *sk)
+{
+	if (INET_ADDR_SYMMETRY(sk) && INET_PORT_SYMMETRY(sk))
+		return -EADDRNOTAVAIL;
+
+	return 0;
+}
+
 /* RFC 6056 3.3.4.  Algorithm 4: Double-Hash Port Selection Algorithm
  * Note that we use 32bit integers (vs RFC 'short integers')
  * because 2^16 is not a multiple of num_ephemeral and this
@@ -997,13 +1012,13 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 			struct sock *, __u16, struct inet_timewait_sock **))
 {
 	struct inet_hashinfo *hinfo = death_row->hashinfo;
+	bool tb_created = false, symmetry_test = false;
 	struct inet_bind_hashbucket *head, *head2;
 	struct inet_timewait_sock *tw = NULL;
 	int port = inet_sk(sk)->inet_num;
 	struct net *net = sock_net(sk);
 	struct inet_bind2_bucket *tb2;
 	struct inet_bind_bucket *tb;
-	bool tb_created = false;
 	u32 remaining, offset;
 	int ret, i, low, high;
 	bool local_ports;
@@ -1011,12 +1026,18 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	u32 index;
 
 	if (port) {
-		local_bh_disable();
-		ret = check_established(death_row, sk, port, NULL);
-		local_bh_enable();
+		ret = inet_tuple_symmetry(sk);
+		if (!ret) {
+			local_bh_disable();
+			ret = check_established(death_row, sk, port, NULL);
+			local_bh_enable();
+		}
 		return ret;
 	}
 
+	if (INET_ADDR_SYMMETRY(sk))
+		symmetry_test = true;
+
 	l3mdev = inet_sk_bound_l3mdev(sk);
 
 	local_ports = inet_sk_get_local_port_range(sk, &low, &high);
@@ -1046,6 +1067,8 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 			port -= remaining;
 		if (inet_is_local_reserved_port(net, port))
 			continue;
+		if (symmetry_test && INET_PORT_SYMMETRY_MATCH(sk, port))
+			continue;
 		head = &hinfo->bhash[inet_bhashfn(net, port,
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);