Message ID | 23b5678b-1e5a-be6c-ea68-b7a20dff4bbc@163.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: Fix inet_bind2_bucket_match_addr_any() regression | expand |
Hi, Thanks for the patch. From: Jianguo Wu <wujianguo106@163.com> Date: Thu, 21 Mar 2024 11:02:36 +0800 > From: Jianguo Wu <wujianguo@chinatelecom.cn> > > If we bind() a TCPv4 socket to 0.0.0.0:8090, then bind() a TCPv6(ipv6only) socket Please wrap each line at <75 characters except for logs/output like below. > to :::8090, both without SO_REUSEPORT, then bind() 127.0.0.1:8090, it should fail [::]:8090 is easier to read and the recommended way. https://datatracker.ietf.org/doc/html/rfc5952#section-6 But please keep the netstat output as is. > but now succeeds. like this: > tcp 0 0 127.0.0.1:8090 0.0.0.0:* LISTEN > tcp 0 0 0.0.0.0:8090 0.0.0.0:* LISTEN > tcp6 0 0 :::8090 :::* LISTEN > > bind() 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. What do you mean by all fail ? At least, [::1]:8090 would fail with the current code in this case. > But if we bind() a TCPv6(ipv6only) socket to :::8090 first, then bind() a TCPv4 > socket to 0.0.0.0:8090, then bind() 127.0.0.1:8090, 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. > > When bind() 127.0.0.1:8090, inet_bind2_bucket_match_addr_any() will return true as tb->addr_type == IPV6_ADDR_ANY, Let's use tb2 here for inet_bind2_bucket.. yes it's not consistent in some functions like inet_bind2_bucket_match_addr_any() though. > and tb is refer to the TCPv6 socket(:::8090), then inet_bhash2_conflict() return false, That is, there is no conflict, Also make it clear that the TCPv6 socket is ipv6only one. > so bind() succeeds. > > inet_bhash2_addr_any_conflict() > { > inet_bind_bucket_for_each(tb2, &head2->chain) > // tb2 is IPv6 > if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) > break; > > // inet_bhash2_conflict() return false > if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, > reuseport_ok)) { > spin_unlock(&head2->lock); > return true; > } > > } > > Fixes: 5a22bba13d01 ("tcp: Save address type in inet_bind2_bucket.") This is not the commit that introduced the regression. Also, you need Signed-off-by tag here. > --- > net/ipv4/inet_hashtables.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 7498af320164..3eeaca8a113f 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -830,8 +830,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const > return false; > > #if IS_ENABLED(CONFIG_IPV6) > - if (tb->addr_type == IPV6_ADDR_ANY) > - return true; > + if (sk->sk_family == AF_INET6) > + return tb->addr_type == IPV6_ADDR_ANY; This fix is not correct and will break v4-mapped-v6 address cases. You can run bind_wildcard under the selftest directory. Probably we need v6_only bit in tb2 and should add some test cases in the selftest. > > if (tb->addr_type != IPV6_ADDR_MAPPED) > return false;
Hi Kuniyuki, Thanks for your reply! On 2024/3/21 12:55, Kuniyuki Iwashima wrote: > Hi, > > Thanks for the patch. > > From: Jianguo Wu <wujianguo106@163.com> > Date: Thu, 21 Mar 2024 11:02:36 +0800 >> From: Jianguo Wu <wujianguo@chinatelecom.cn> >> >> If we bind() a TCPv4 socket to 0.0.0.0:8090, then bind() a TCPv6(ipv6only) socket > > Please wrap each line at <75 characters except for logs/output like below. > OK. > >> to :::8090, both without SO_REUSEPORT, then bind() 127.0.0.1:8090, it should fail > > [::]:8090 is easier to read and the recommended way. > https://datatracker.ietf.org/doc/html/rfc5952#section-6 > > But please keep the netstat output as is. > >> but now succeeds. like this: >> tcp 0 0 127.0.0.1:8090 0.0.0.0:* LISTEN >> tcp 0 0 0.0.0.0:8090 0.0.0.0:* LISTEN >> tcp6 0 0 :::8090 :::* LISTEN >> >> bind() 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. > > What do you mean by all fail ? > At least, [::1]:8090 would fail with the current code in this case. In my test, 127.0.0.1:8090 succeeds, 0.0.0.0:8090, [::]:8090 and [::1]:8090 are all fail > > >> But if we bind() a TCPv6(ipv6only) socket to :::8090 first, then bind() a TCPv4 >> socket to 0.0.0.0:8090, then bind() 127.0.0.1:8090, 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. >> >> When bind() 127.0.0.1:8090, inet_bind2_bucket_match_addr_any() will return true as tb->addr_type == IPV6_ADDR_ANY, > > Let's use tb2 here for inet_bind2_bucket.. yes it's not consistent > in some functions like inet_bind2_bucket_match_addr_any() though. yes, inet_bind2_bucket_match_addr_any() use tb, so I use tb here. > > >> and tb is refer to the TCPv6 socket(:::8090), then inet_bhash2_conflict() return false, That is, there is no conflict, > > Also make it clear that the TCPv6 socket is ipv6only one. > > >> so bind() succeeds. >> >> inet_bhash2_addr_any_conflict() >> { >> inet_bind_bucket_for_each(tb2, &head2->chain) >> // tb2 is IPv6 >> if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) >> break; >> >> // inet_bhash2_conflict() return false >> if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, >> reuseport_ok)) { >> spin_unlock(&head2->lock); >> return true; >> } >> >> } >> >> Fixes: 5a22bba13d01 ("tcp: Save address type in inet_bind2_bucket.") > > This is not the commit that introduced the regression. I will remove this. > > Also, you need Signed-off-by tag here. OK. > > >> --- >> net/ipv4/inet_hashtables.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c >> index 7498af320164..3eeaca8a113f 100644 >> --- a/net/ipv4/inet_hashtables.c >> +++ b/net/ipv4/inet_hashtables.c >> @@ -830,8 +830,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const >> return false; >> >> #if IS_ENABLED(CONFIG_IPV6) >> - if (tb->addr_type == IPV6_ADDR_ANY) >> - return true; >> + if (sk->sk_family == AF_INET6) >> + return tb->addr_type == IPV6_ADDR_ANY; > > This fix is not correct and will break v4-mapped-v6 address cases. > You can run bind_wildcard under the selftest directory. >> Probably we need v6_only bit in tb2 and should add some test cases > in the selftest. How about this? I add a new field ipv6_only to struct inet_bind2_bucket{} diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 7f1b38458743..fb7c250a663b 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -96,6 +96,7 @@ struct inet_bind2_bucket { int l3mdev; unsigned short port; #if IS_ENABLED(CONFIG_IPV6) + bool ipv6_only; unsigned short addr_type; struct in6_addr v6_rcv_saddr; #define rcv_saddr v6_rcv_saddr.s6_addr32[3] diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index cf88eca5f1b4..5fc749f8f2b1 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -110,6 +110,7 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb2, tb2->port = tb->port; #if IS_ENABLED(CONFIG_IPV6) BUILD_BUG_ON(USHRT_MAX < (IPV6_ADDR_ANY | IPV6_ADDR_MAPPED)); + tb2->ipv6_only = ipv6_only_sock(sk); if (sk->sk_family == AF_INET6) { tb2->addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); tb2->v6_rcv_saddr = sk->sk_v6_rcv_saddr; @@ -831,7 +832,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const #if IS_ENABLED(CONFIG_IPV6) if (tb->addr_type == IPV6_ADDR_ANY) - return true; + if (sk->sk_family == AF_INET6 || !tb->ipv6_only) + return true; if (tb->addr_type != IPV6_ADDR_MAPPED) return false; > > >> >> if (tb->addr_type != IPV6_ADDR_MAPPED) >> return false;
From: Jianguo Wu <wujianguo106@163.com> Date: Fri, 22 Mar 2024 11:16:09 +0800 > Hi Kuniyuki, > Thanks for your reply! > > On 2024/3/21 12:55, Kuniyuki Iwashima wrote: > > Hi, > > > > Thanks for the patch. > > > > From: Jianguo Wu <wujianguo106@163.com> > > Date: Thu, 21 Mar 2024 11:02:36 +0800 > >> From: Jianguo Wu <wujianguo@chinatelecom.cn> > >> > >> If we bind() a TCPv4 socket to 0.0.0.0:8090, then bind() a TCPv6(ipv6only) socket > > > > Please wrap each line at <75 characters except for logs/output like below. > > > OK. > > > >> to :::8090, both without SO_REUSEPORT, then bind() 127.0.0.1:8090, it should fail > > > > [::]:8090 is easier to read and the recommended way. > > https://datatracker.ietf.org/doc/html/rfc5952#section-6 > > > > But please keep the netstat output as is. > > > >> but now succeeds. like this: > >> tcp 0 0 127.0.0.1:8090 0.0.0.0:* LISTEN > >> tcp 0 0 0.0.0.0:8090 0.0.0.0:* LISTEN > >> tcp6 0 0 :::8090 :::* LISTEN > >> > >> bind() 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. > > > > What do you mean by all fail ? > > At least, [::1]:8090 would fail with the current code in this case. > In my test, 127.0.0.1:8090 succeeds, 0.0.0.0:8090, [::]:8090 and [::1]:8090 are all fail > > > > > > >> But if we bind() a TCPv6(ipv6only) socket to :::8090 first, then bind() a TCPv4 > >> socket to 0.0.0.0:8090, then bind() 127.0.0.1:8090, 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. > >> > >> When bind() 127.0.0.1:8090, inet_bind2_bucket_match_addr_any() will return true as tb->addr_type == IPV6_ADDR_ANY, > > > > Let's use tb2 here for inet_bind2_bucket.. yes it's not consistent > > in some functions like inet_bind2_bucket_match_addr_any() though. > yes, inet_bind2_bucket_match_addr_any() use tb, so I use tb here. > > > > > > >> and tb is refer to the TCPv6 socket(:::8090), then inet_bhash2_conflict() return false, That is, there is no conflict, > > > > Also make it clear that the TCPv6 socket is ipv6only one. > > > > > >> so bind() succeeds. > >> > >> inet_bhash2_addr_any_conflict() > >> { > >> inet_bind_bucket_for_each(tb2, &head2->chain) > >> // tb2 is IPv6 > >> if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) > >> break; > >> > >> // inet_bhash2_conflict() return false > >> if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, > >> reuseport_ok)) { > >> spin_unlock(&head2->lock); > >> return true; > >> } > >> > >> } > >> > >> Fixes: 5a22bba13d01 ("tcp: Save address type in inet_bind2_bucket.") > > > > This is not the commit that introduced the regression. > I will remove this. > > > > Also, you need Signed-off-by tag here. > OK. > > > > > >> --- > >> net/ipv4/inet_hashtables.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > >> index 7498af320164..3eeaca8a113f 100644 > >> --- a/net/ipv4/inet_hashtables.c > >> +++ b/net/ipv4/inet_hashtables.c > >> @@ -830,8 +830,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const > >> return false; > >> > >> #if IS_ENABLED(CONFIG_IPV6) > >> - if (tb->addr_type == IPV6_ADDR_ANY) > >> - return true; > >> + if (sk->sk_family == AF_INET6) > >> + return tb->addr_type == IPV6_ADDR_ANY; > > > > This fix is not correct and will break v4-mapped-v6 address cases. > > You can run bind_wildcard under the selftest directory. > >> Probably we need v6_only bit in tb2 and should add some test cases > > in the selftest. > How about this? > I add a new field ipv6_only to struct inet_bind2_bucket{} > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h > index 7f1b38458743..fb7c250a663b 100644 > --- a/include/net/inet_hashtables.h > +++ b/include/net/inet_hashtables.h > @@ -96,6 +96,7 @@ struct inet_bind2_bucket { > int l3mdev; > unsigned short port; > #if IS_ENABLED(CONFIG_IPV6) > + bool ipv6_only; > unsigned short addr_type; > struct in6_addr v6_rcv_saddr; > #define rcv_saddr v6_rcv_saddr.s6_addr32[3] > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index cf88eca5f1b4..5fc749f8f2b1 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -110,6 +110,7 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb2, > tb2->port = tb->port; > #if IS_ENABLED(CONFIG_IPV6) > BUILD_BUG_ON(USHRT_MAX < (IPV6_ADDR_ANY | IPV6_ADDR_MAPPED)); > + tb2->ipv6_only = ipv6_only_sock(sk); > if (sk->sk_family == AF_INET6) { > tb2->addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); > tb2->v6_rcv_saddr = sk->sk_v6_rcv_saddr; > @@ -831,7 +832,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const > > #if IS_ENABLED(CONFIG_IPV6) > if (tb->addr_type == IPV6_ADDR_ANY) > - return true; > + if (sk->sk_family == AF_INET6 || !tb->ipv6_only) > + return true; > > if (tb->addr_type != IPV6_ADDR_MAPPED) > return false; > I found a corner case that does not work with this change. The 3rd bind() should fail but succeeds with the v6only flag approach. ---8<--- from socket import * s1 = socket(AF_INET6) s1.setsockopt(41, IPV6_V6ONLY, 1) s1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s1.bind(('::', 8000)) s2 = socket(AF_INET6) s2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s2.bind(('::', 8000)) s3 = socket(AF_INET) s3.bind(('127.0.0.1', 8000)) ---8<--- To cover this case, the only solution is to iterate over the buckets in the same hash. Also, this change can be backported to 6.1. ---8<--- diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 612aa1d2eff7..63e8f2df0681 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -288,6 +288,7 @@ static bool inet_bhash2_addr_any_conflict(const struct sock *sk, int port, int l struct sock_reuseport *reuseport_cb; struct inet_bind_hashbucket *head2; struct inet_bind2_bucket *tb2; + bool conflict = false; bool reuseport_cb_ok; rcu_read_lock(); @@ -300,18 +301,20 @@ static bool inet_bhash2_addr_any_conflict(const struct sock *sk, int port, int l spin_lock(&head2->lock); - inet_bind_bucket_for_each(tb2, &head2->chain) - if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) - break; + inet_bind_bucket_for_each(tb2, &head2->chain) { + if (!inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) + continue; - if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, - reuseport_ok)) { - spin_unlock(&head2->lock); - return true; + if (!inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, reuseport_ok)) + continue; + + conflict = true; + break; } spin_unlock(&head2->lock); - return false; + + return conflict; } /* ---8<--- Also, I found another regression that my recent patch introduced. The second bind() should succeed but now fails. ---8<--- from socket import * s1 = socket(AF_INET6) s1.setsockopt(41, IPV6_V6ONLY, 1) s1.bind(('::', 8000)) s2 = socket(AF_INET6) s2.bind(('::ffff:127.0.0.1', 8000)) ---8<--- and we need this change. ---8<--- diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 7d8090f109ef..612aa1d2eff7 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -203,7 +203,8 @@ static bool __inet_bhash2_conflict(const struct sock *sk, struct sock *sk2, kuid_t sk_uid, bool relax, bool reuseport_cb_ok, bool reuseport_ok) { - if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) + if (ipv6_only_sock(sk2) && + (sk->sk_family == AF_INET || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))) return false; return inet_bind_conflict(sk, sk2, sk_uid, relax, ---8<--- I'll post a series including necessary tests. Thanks!
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 7498af320164..3eeaca8a113f 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -830,8 +830,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const return false; #if IS_ENABLED(CONFIG_IPV6) - if (tb->addr_type == IPV6_ADDR_ANY) - return true; + if (sk->sk_family == AF_INET6) + return tb->addr_type == IPV6_ADDR_ANY; if (tb->addr_type != IPV6_ADDR_MAPPED) return false;
From: Jianguo Wu <wujianguo@chinatelecom.cn> If we bind() a TCPv4 socket to 0.0.0.0:8090, then bind() a TCPv6(ipv6only) socket to :::8090, both without SO_REUSEPORT, then bind() 127.0.0.1:8090, it should fail but now succeeds. like this: tcp 0 0 127.0.0.1:8090 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:8090 0.0.0.0:* LISTEN tcp6 0 0 :::8090 :::* LISTEN bind() 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. But if we bind() a TCPv6(ipv6only) socket to :::8090 first, then bind() a TCPv4 socket to 0.0.0.0:8090, then bind() 127.0.0.1:8090, 0.0.0.0:8090, :::8090 and ::1:8090 are all fail. When bind() 127.0.0.1:8090, inet_bind2_bucket_match_addr_any() will return true as tb->addr_type == IPV6_ADDR_ANY, and tb is refer to the TCPv6 socket(:::8090), then inet_bhash2_conflict() return false, That is, there is no conflict, so bind() succeeds. inet_bhash2_addr_any_conflict() { inet_bind_bucket_for_each(tb2, &head2->chain) // tb2 is IPv6 if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) break; // inet_bhash2_conflict() return false if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, reuseport_ok)) { spin_unlock(&head2->lock); return true; } } Fixes: 5a22bba13d01 ("tcp: Save address type in inet_bind2_bucket.") --- net/ipv4/inet_hashtables.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)