Message ID | 20240818042538.40195-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: do not allow to connect with the four-tuple symmetry socket | expand |
On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > Four-tuple symmetry here means the socket has the same remote/local > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > $ ss -nat | grep 8000 > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > Before this patch, one client could start a connection successfully > as above even without a listener, which means, the socket connects > to its self. Then every time other threads trying to bind/listen on > this port will encounter a failure surely, unless the thread owning > the socket exits. > > It can rarely happen on the loopback device when the connect() finds > the same port as its remote port while listener is not running. It > has the side-effect on other threads. Besides, this solo flow has no > merit, no significance at all. > > After this patch, the moment we try to connect with a 4-tuple symmetry > socket, we will get an error "connect: Cannot assign requested address". > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/ipv4/inet_hashtables.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 9bfcfd016e18..2f8f34ee62fb 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -978,6 +978,21 @@ void inet_bhash2_reset_saddr(struct sock *sk) > } > EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr); > > +/* SYMMETRY means the socket has the same local and remote port/ipaddr */ > +#define INET_ADDR_SYMMETRY(sk) (inet_sk(sk)->inet_rcv_saddr == \ > + inet_sk(sk)->inet_daddr) > +#define INET_PORT_SYMMETRY(sk) (inet_sk(sk)->inet_num == \ > + ntohs(inet_sk(sk)->inet_dport)) > +#define INET_PORT_SYMMETRY_MATCH(sk, port) (port == \ > + ntohs(inet_sk(sk)->inet_dport)) > +static inline int inet_tuple_symmetry(struct sock *sk) Patchwork shows to me that I should not use inline in the .c file. I will change it in due course in v2 patch.
On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Four-tuple symmetry here means the socket has the same remote/local > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > $ ss -nat | grep 8000 > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 Thanks to the failed tests appearing in patchwork, now I'm aware of the technical term called "self-connection" in English to describe this case. I will update accordingly the title, body messages, function name by introducing "self-connection" words like this in the next submission. Following this clue, I saw many reports happening in these years, like [1][2]. Users are often astonished about this phenomenon and lost and have to find various ways to workaround it. Since, in my opinion, the self-connection doesn't have any advantage and usefulness, why not avoid it in the kernel? Could networking experts enlighten me? Thanks. + Dmitry Hello Dmitry, do you know why the self-connect_ipv4/6 was introduced in the selftests which the patch I wrote failed? Thanks. [1]: https://adil.medium.com/what-is-tcp-self-connect-issue-be7d7b5f9f59 [2]: https://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention Thanks, Jason
From: Jason Xing <kerneljasonxing@gmail.com> Date: Sun, 18 Aug 2024 21:50:51 +0800 > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > $ ss -nat | grep 8000 > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > Thanks to the failed tests appearing in patchwork, now I'm aware of > the technical term called "self-connection" in English to describe > this case. I will update accordingly the title, body messages, > function name by introducing "self-connection" words like this in the > next submission. > > Following this clue, I saw many reports happening in these years, like > [1][2]. Users are often astonished about this phenomenon and lost and > have to find various ways to workaround it. Since, in my opinion, the > self-connection doesn't have any advantage and usefulness, It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case in tcp_rcv_synsent_state_process(). https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 So you can't remove self-connect functionality, the recent main user is syzkaller though.
Hello Kuniyuki, On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > $ ss -nat | grep 8000 > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > the technical term called "self-connection" in English to describe > > this case. I will update accordingly the title, body messages, > > function name by introducing "self-connection" words like this in the > > next submission. > > > > Following this clue, I saw many reports happening in these years, like > > [1][2]. Users are often astonished about this phenomenon and lost and > > have to find various ways to workaround it. Since, in my opinion, the > > self-connection doesn't have any advantage and usefulness, > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > in tcp_rcv_synsent_state_process(). > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 Yes, I noticed this one: self-connection is one particular case among simultaneously open cases. Honestly, it's really strange that client and server uses a single socket. > > So you can't remove self-connect functionality, the recent main user is > syzkaller though. Ah, thanks for reminding me. It seems that I have to drop this patch and there is no good way to resolve the issue in the kernel. Thanks, Jason
On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > Hello Kuniyuki, > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > > $ ss -nat | grep 8000 > > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > > the technical term called "self-connection" in English to describe > > > this case. I will update accordingly the title, body messages, > > > function name by introducing "self-connection" words like this in the > > > next submission. > > > > > > Following this clue, I saw many reports happening in these years, like > > > [1][2]. Users are often astonished about this phenomenon and lost and > > > have to find various ways to workaround it. Since, in my opinion, the > > > self-connection doesn't have any advantage and usefulness, > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > > in tcp_rcv_synsent_state_process(). > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 > > Yes, I noticed this one: self-connection is one particular case among > simultaneously open cases. Honestly, it's really strange that client > and server uses a single socket. > > > > > So you can't remove self-connect functionality, the recent main user is > > syzkaller though. > > Ah, thanks for reminding me. It seems that I have to drop this patch > and there is no good way to resolve the issue in the kernel. > Can we introduce one sysctl knob to control it since we can tell there are many user reports/complaints through the internet? Default setting of the new knob is to allow users to connect to itself like right now, not interfering with many years of habits, like what the test tools currently use. Can I give it a shot? Thanks, Jason
On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > Hello Kuniyuki, > > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > > > $ ss -nat | grep 8000 > > > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > > > the technical term called "self-connection" in English to describe > > > > this case. I will update accordingly the title, body messages, > > > > function name by introducing "self-connection" words like this in the > > > > next submission. > > > > > > > > Following this clue, I saw many reports happening in these years, like > > > > [1][2]. Users are often astonished about this phenomenon and lost and > > > > have to find various ways to workaround it. Since, in my opinion, the > > > > self-connection doesn't have any advantage and usefulness, > > > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > > > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > > > in tcp_rcv_synsent_state_process(). > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 > > > > Yes, I noticed this one: self-connection is one particular case among > > simultaneously open cases. Honestly, it's really strange that client > > and server uses a single socket. > > > > > > > > So you can't remove self-connect functionality, the recent main user is > > > syzkaller though. > > > > Ah, thanks for reminding me. It seems that I have to drop this patch > > and there is no good way to resolve the issue in the kernel. > > > > Can we introduce one sysctl knob to control it since we can tell there > are many user reports/complaints through the internet? Default setting > of the new knob is to allow users to connect to itself like right now, > not interfering with many years of habits, like what the test tools > currently use. > > Can I give it a shot? No you can not. netfilter can probably do this. If it can not, it could be augmented I think.
Hello Eric, On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > Hello Kuniyuki, > > > > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > > > > $ ss -nat | grep 8000 > > > > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > > > > the technical term called "self-connection" in English to describe > > > > > this case. I will update accordingly the title, body messages, > > > > > function name by introducing "self-connection" words like this in the > > > > > next submission. > > > > > > > > > > Following this clue, I saw many reports happening in these years, like > > > > > [1][2]. Users are often astonished about this phenomenon and lost and > > > > > have to find various ways to workaround it. Since, in my opinion, the > > > > > self-connection doesn't have any advantage and usefulness, > > > > > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > > > > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > > > > in tcp_rcv_synsent_state_process(). > > > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 > > > > > > Yes, I noticed this one: self-connection is one particular case among > > > simultaneously open cases. Honestly, it's really strange that client > > > and server uses a single socket. > > > > > > > > > > > So you can't remove self-connect functionality, the recent main user is > > > > syzkaller though. > > > > > > Ah, thanks for reminding me. It seems that I have to drop this patch > > > and there is no good way to resolve the issue in the kernel. > > > > > > > Can we introduce one sysctl knob to control it since we can tell there > > are many user reports/complaints through the internet? Default setting > > of the new knob is to allow users to connect to itself like right now, > > not interfering with many years of habits, like what the test tools > > currently use. > > > > Can I give it a shot? > > No you can not. May I ask why? Is it because self-connection adheres to the simultaneously open part in RFC 9293? I feel this case is very particular, not explained well in the RFC. Usually, we don't consider one socket to act as client and server unless in debug or test circumstances. As you can see, some people have encountered the issue for a long time. > > netfilter can probably do this. Sure, It can do. It can be a little bit helpful, but clumsy. We have to set specific rules for each possible listener and then drop those SYN packets if they carry the same remote and local port/ip. Thanks, Jason
On Mon, Aug 19, 2024 at 11:02 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > Hello Eric, > > On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > Hello Kuniyuki, > > > > > > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > > > > > $ ss -nat | grep 8000 > > > > > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > > > > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > > > > > the technical term called "self-connection" in English to describe > > > > > > this case. I will update accordingly the title, body messages, > > > > > > function name by introducing "self-connection" words like this in the > > > > > > next submission. > > > > > > > > > > > > Following this clue, I saw many reports happening in these years, like > > > > > > [1][2]. Users are often astonished about this phenomenon and lost and > > > > > > have to find various ways to workaround it. Since, in my opinion, the > > > > > > self-connection doesn't have any advantage and usefulness, > > > > > > > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > > > > > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > > > > > in tcp_rcv_synsent_state_process(). > > > > > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 > > > > > > > > Yes, I noticed this one: self-connection is one particular case among > > > > simultaneously open cases. Honestly, it's really strange that client > > > > and server uses a single socket. > > > > > > > > > > > > > > So you can't remove self-connect functionality, the recent main user is > > > > > syzkaller though. > > > > > > > > Ah, thanks for reminding me. It seems that I have to drop this patch > > > > and there is no good way to resolve the issue in the kernel. > > > > > > > > > > Can we introduce one sysctl knob to control it since we can tell there > > > are many user reports/complaints through the internet? Default setting > > > of the new knob is to allow users to connect to itself like right now, > > > not interfering with many years of habits, like what the test tools > > > currently use. > > > > > > Can I give it a shot? > > > > No you can not. > > May I ask why? Is it because self-connection adheres to the > simultaneously open part in RFC 9293? This will break some user programs, obviously. I will ask you the opposite : What RFC prevents the current situation ? > > I feel this case is very particular, not explained well in the RFC. > Usually, we don't consider one socket to act as client and server > unless in debug or test circumstances. As you can see, some people > have encountered the issue for a long time. Can you provide links to the issues ? How can a programmer hit this using standard and documented ways (passive, active flows) ? > > > > > netfilter can probably do this. > > Sure, It can do. It can be a little bit helpful, but clumsy. We have > to set specific rules for each possible listener and then drop those > SYN packets if they carry the same remote and local port/ip. > > Thanks, > Jason
On Mon, Aug 19, 2024 at 5:18 PM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Aug 19, 2024 at 11:02 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > Hello Eric, > > > > On Mon, Aug 19, 2024 at 3:30 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Mon, Aug 19, 2024 at 2:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Mon, Aug 19, 2024 at 7:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > Hello Kuniyuki, > > > > > > > > > > On Mon, Aug 19, 2024 at 2:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > > > Date: Sun, 18 Aug 2024 21:50:51 +0800 > > > > > > > On Sun, Aug 18, 2024 at 1:16 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > > > > > On Sun, Aug 18, 2024 at 12:25 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > > > > > Four-tuple symmetry here means the socket has the same remote/local > > > > > > > > > port and ipaddr, like this, 127.0.0.1:8000 -> 127.0.0.1:8000. > > > > > > > > > $ ss -nat | grep 8000 > > > > > > > > > ESTAB 0 0 127.0.0.1:8000 127.0.0.1:8000 > > > > > > > > > > > > > > Thanks to the failed tests appearing in patchwork, now I'm aware of > > > > > > > the technical term called "self-connection" in English to describe > > > > > > > this case. I will update accordingly the title, body messages, > > > > > > > function name by introducing "self-connection" words like this in the > > > > > > > next submission. > > > > > > > > > > > > > > Following this clue, I saw many reports happening in these years, like > > > > > > > [1][2]. Users are often astonished about this phenomenon and lost and > > > > > > > have to find various ways to workaround it. Since, in my opinion, the > > > > > > > self-connection doesn't have any advantage and usefulness, > > > > > > > > > > > > It's useful if you want to test simultaneous connect (SYN_SENT -> SYN_RECV) > > > > > > path as you see in TCP-AO tests. See RFC 9293 and the (!ack && syn) case > > > > > > in tcp_rcv_synsent_state_process(). > > > > > > > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 > > > > > > > > > > Yes, I noticed this one: self-connection is one particular case among > > > > > simultaneously open cases. Honestly, it's really strange that client > > > > > and server uses a single socket. > > > > > > > > > > > > > > > > > So you can't remove self-connect functionality, the recent main user is > > > > > > syzkaller though. > > > > > > > > > > Ah, thanks for reminding me. It seems that I have to drop this patch > > > > > and there is no good way to resolve the issue in the kernel. > > > > > > > > > > > > > Can we introduce one sysctl knob to control it since we can tell there > > > > are many user reports/complaints through the internet? Default setting > > > > of the new knob is to allow users to connect to itself like right now, > > > > not interfering with many years of habits, like what the test tools > > > > currently use. > > > > > > > > Can I give it a shot? > > > > > > No you can not. > > > > May I ask why? Is it because self-connection adheres to the > > simultaneously open part in RFC 9293? > > This will break some user programs, obviously. I agree. It'a headache. > > I will ask you the opposite : What RFC prevents the current situation ? Not really, actually. The reason why I wrote the patch is because it indeed happened. > > > > > I feel this case is very particular, not explained well in the RFC. > > Usually, we don't consider one socket to act as client and server > > unless in debug or test circumstances. As you can see, some people > > have encountered the issue for a long time. > > Can you provide links to the issues ? How can a programmer hit this > using standard and documented ways (passive, active flows) ? > I've listed some of them in the previous discussion through googling "tcp self connection" or something like that. Let me copy here: [1]: https://adil.medium.com/what-is-tcp-self-connect-issue-be7d7b5f9f59 [2]: https://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention After investigating such an issue more deeply in the customers' machines, the main reason why it can happen is the listener exits while another thread starts to connect, which can cause self-connection, even though the chance is slim. Later, the listener tries to listen and for sure it will fail due to that single self-connection. Thanks, Jason
On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > After investigating such an issue more deeply in the customers' > machines, the main reason why it can happen is the listener exits > while another thread starts to connect, which can cause > self-connection, even though the chance is slim. Later, the listener > tries to listen and for sure it will fail due to that single > self-connection. This would happen if the range of ephemeral ports include the listening port, which is discouraged. ip_local_reserved_ports is supposed to help. This looks like a security issue to me, and netfilter can handle it.
On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > After investigating such an issue more deeply in the customers' > > machines, the main reason why it can happen is the listener exits > > while another thread starts to connect, which can cause > > self-connection, even though the chance is slim. Later, the listener > > tries to listen and for sure it will fail due to that single > > self-connection. > > This would happen if the range of ephemeral ports include the listening port, > which is discouraged. Yes. > > ip_local_reserved_ports is supposed to help. Sure, I workarounded it by using this and it worked. > > This looks like a security issue to me, and netfilter can handle it. I have to admit setting netfilter rules for each flow is not a very user-friendly way. Anyway, really thanks for your suggestion :) Thanks, Jason
From: Jason Xing <kerneljasonxing@gmail.com> Date: Mon, 19 Aug 2024 17:41:32 +0800 > On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > After investigating such an issue more deeply in the customers' > > > machines, the main reason why it can happen is the listener exits > > > while another thread starts to connect, which can cause > > > self-connection, even though the chance is slim. Later, the listener > > > tries to listen and for sure it will fail due to that single > > > self-connection. > > > > This would happen if the range of ephemeral ports include the listening port, > > which is discouraged. > > Yes. > > > > > ip_local_reserved_ports is supposed to help. > > Sure, I workarounded it by using this and it worked. > > > > > This looks like a security issue to me, and netfilter can handle it. > > I have to admit setting netfilter rules for each flow is not a very > user-friendly way. I think even netfilter is not needed. It sounds like the server application needs to implement graceful shutdown. The server should not release the port if there are on-going clients. The server should spin up a new process and use the following to transfer the remaining connections: * FD passing * SO_REUSEPORT w/ (net.ipv4.tcp_migrate_req or BPF) Then, no client can occupy the server's port even without ip_local_reserved_ports. But I still recommend using ip_local_reserved_ports unless the server port is random.
On Tue, Aug 20, 2024 at 1:56 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Mon, 19 Aug 2024 17:41:32 +0800 > > On Mon, Aug 19, 2024 at 5:38 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Mon, Aug 19, 2024 at 11:32 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > After investigating such an issue more deeply in the customers' > > > > machines, the main reason why it can happen is the listener exits > > > > while another thread starts to connect, which can cause > > > > self-connection, even though the chance is slim. Later, the listener > > > > tries to listen and for sure it will fail due to that single > > > > self-connection. > > > > > > This would happen if the range of ephemeral ports include the listening port, > > > which is discouraged. > > > > Yes. > > > > > > > > ip_local_reserved_ports is supposed to help. > > > > Sure, I workarounded it by using this and it worked. > > > > > > > > This looks like a security issue to me, and netfilter can handle it. > > > > I have to admit setting netfilter rules for each flow is not a very > > user-friendly way. > > I think even netfilter is not needed. > > It sounds like the server application needs to implement graceful shutdown. > The server should not release the port if there are on-going clients. The > server should spin up a new process and use the following to transfer the > remaining connections: > > * FD passing > * SO_REUSEPORT w/ (net.ipv4.tcp_migrate_req or BPF) > > Then, no client can occupy the server's port even without > ip_local_reserved_ports. > > But I still recommend using ip_local_reserved_ports unless the server port > is random. Yes, there are some ways that can mitigate or solve the issue. The reason why I wrote the patch is because at the beginning I don't think the self-connection feature is useful which you reminded me of some test tools like syzbot could use. That's not what I was aware of. Thanks for your reply. Thanks, Jason
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 9bfcfd016e18..2f8f34ee62fb 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -978,6 +978,21 @@ void inet_bhash2_reset_saddr(struct sock *sk) } EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr); +/* SYMMETRY means the socket has the same local and remote port/ipaddr */ +#define INET_ADDR_SYMMETRY(sk) (inet_sk(sk)->inet_rcv_saddr == \ + inet_sk(sk)->inet_daddr) +#define INET_PORT_SYMMETRY(sk) (inet_sk(sk)->inet_num == \ + ntohs(inet_sk(sk)->inet_dport)) +#define INET_PORT_SYMMETRY_MATCH(sk, port) (port == \ + ntohs(inet_sk(sk)->inet_dport)) +static inline int inet_tuple_symmetry(struct sock *sk) +{ + if (INET_ADDR_SYMMETRY(sk) && INET_PORT_SYMMETRY(sk)) + return -EADDRNOTAVAIL; + + return 0; +} + /* RFC 6056 3.3.4. Algorithm 4: Double-Hash Port Selection Algorithm * Note that we use 32bit integers (vs RFC 'short integers') * because 2^16 is not a multiple of num_ephemeral and this @@ -997,13 +1012,13 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, struct sock *, __u16, struct inet_timewait_sock **)) { struct inet_hashinfo *hinfo = death_row->hashinfo; + bool tb_created = false, symmetry_test = false; struct inet_bind_hashbucket *head, *head2; struct inet_timewait_sock *tw = NULL; int port = inet_sk(sk)->inet_num; struct net *net = sock_net(sk); struct inet_bind2_bucket *tb2; struct inet_bind_bucket *tb; - bool tb_created = false; u32 remaining, offset; int ret, i, low, high; bool local_ports; @@ -1011,12 +1026,18 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, u32 index; if (port) { - local_bh_disable(); - ret = check_established(death_row, sk, port, NULL); - local_bh_enable(); + ret = inet_tuple_symmetry(sk); + if (!ret) { + local_bh_disable(); + ret = check_established(death_row, sk, port, NULL); + local_bh_enable(); + } return ret; } + if (INET_ADDR_SYMMETRY(sk)) + symmetry_test = true; + l3mdev = inet_sk_bound_l3mdev(sk); local_ports = inet_sk_get_local_port_range(sk, &low, &high); @@ -1046,6 +1067,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, port -= remaining; if (inet_is_local_reserved_port(net, port)) continue; + if (symmetry_test && INET_PORT_SYMMETRY_MATCH(sk, port)) + continue; head = &hinfo->bhash[inet_bhashfn(net, port, hinfo->bhash_size)]; spin_lock_bh(&head->lock);