Message ID | 20230911165106.39384-2-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: Fix bind() regression for v4-mapped-v6 address | expand |
On Mon, Sep 11, 2023 at 9:52 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > Andrei Vagin reported bind() regression with strace logs. > > If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 > socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. > > from socket import * > > s1 = socket(AF_INET6, SOCK_STREAM) > s1.bind(('::ffff:0.0.0.0', 0)) > > s2 = socket(AF_INET, SOCK_STREAM) > s2.bind(('127.0.0.1', s1.getsockname()[1])) > > During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is > AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check > if tb has the v4-mapped-v6 wildcard address. > > The example above does not work after commit 5456262d2baa ("net: Fix > incorrect address comparison when searching for a bind2 bucket"), but > the blamed change is not the commit. > > Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated > as 0.0.0.0, and the sequence above worked by chance. Technically, this > case has been broken since bhash2 was introduced. > > Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, > the 2nd bind() fails properly because we fall back to using bhash to > detect conflicts for the v4-mapped-v6 address. I think we have one more issue here: socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 4 bind(3, {sa_family=AF_INET6, sin6_port=htons(9999), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, 28) = 0 bind(4, {sa_family=AF_INET, sin_port=htons(9999), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 I think the second bind has to return EADDRINUSE, doesn't it? I don't deep dive to this code, but after a quick look, I think we can do something like this: diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 7876b7d703cb..f7a700d392d0 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -110,8 +110,13 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb, tb->l3mdev = l3mdev; tb->port = port; #if IS_ENABLED(CONFIG_IPV6) - tb->family = sk->sk_family; - if (sk->sk_family == AF_INET6) + + if (sk->sk_family == AF_INET6 && + ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) + tb->family = AF_INET; + else + tb->family = sk->sk_family; + if (tb->family == AF_INET6) tb->v6_rcv_saddr = sk->sk_v6_rcv_saddr; else #endif @@ -149,10 +154,15 @@ static bool inet_bind2_bucket_addr_match(const struct inet_bind2_bucket *tb2, const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family != tb2->family) + unsigned short family = sk->sk_family; + + if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) + family = AF_INET; + + if (family != tb2->family) return false; - if (sk->sk_family == AF_INET6) + if (family == AF_INET6) return ipv6_addr_equal(&tb2->v6_rcv_saddr, &sk->sk_v6_rcv_saddr); #endif @@ -816,10 +826,15 @@ static bool inet_bind2_bucket_match(const struct inet_bind2_bucket *tb, int l3mdev, const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family != tb->family) + unsigned short family = sk->sk_family; + + if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) + family = AF_INET; + + if (family != tb->family) return false; - if (sk->sk_family == AF_INET6) + if (family == AF_INET6) return net_eq(ib2_net(tb), net) && tb->port == port && tb->l3mdev == l3mdev && ipv6_addr_equal(&tb->v6_rcv_saddr, &sk->sk_v6_rcv_saddr); > > Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") > Reported-by: Andrei Vagin <avagin@google.com> > Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/ > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > --- > include/net/ipv6.h | 5 +++++ > net/ipv4/inet_hashtables.c | 3 ++- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/include/net/ipv6.h b/include/net/ipv6.h > index 0675be0f3fa0..56d8217ea6cf 100644 > --- a/include/net/ipv6.h > +++ b/include/net/ipv6.h > @@ -784,6 +784,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) > cpu_to_be32(0x0000ffff))) == 0UL; > } > > +static inline bool ipv6_addr_v4mapped_any(const struct in6_addr *a) > +{ > + return ipv6_addr_v4mapped(a) && ipv4_is_zeronet(a->s6_addr32[3]); > +} > + > static inline bool ipv6_addr_v4mapped_loopback(const struct in6_addr *a) > { > return ipv6_addr_v4mapped(a) && ipv4_is_loopback(a->s6_addr32[3]); > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 7876b7d703cb..0a9b20eb81c4 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -837,7 +837,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const > if (sk->sk_family == AF_INET) > return net_eq(ib2_net(tb), net) && tb->port == port && > tb->l3mdev == l3mdev && > - ipv6_addr_any(&tb->v6_rcv_saddr); > + (ipv6_addr_any(&tb->v6_rcv_saddr) || > + ipv6_addr_v4mapped_any(&tb->v6_rcv_saddr)); > > return false; > } > -- > 2.30.2 >
From: Andrei Vagin <avagin@google.com> Date: Mon, 11 Sep 2023 13:00:19 -0700 > On Mon, Sep 11, 2023 at 9:52 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > Andrei Vagin reported bind() regression with strace logs. > > > > If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 > > socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. > > > > from socket import * > > > > s1 = socket(AF_INET6, SOCK_STREAM) > > s1.bind(('::ffff:0.0.0.0', 0)) > > > > s2 = socket(AF_INET, SOCK_STREAM) > > s2.bind(('127.0.0.1', s1.getsockname()[1])) > > > > During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is > > AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check > > if tb has the v4-mapped-v6 wildcard address. > > > > The example above does not work after commit 5456262d2baa ("net: Fix > > incorrect address comparison when searching for a bind2 bucket"), but > > the blamed change is not the commit. > > > > Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated > > as 0.0.0.0, and the sequence above worked by chance. Technically, this > > case has been broken since bhash2 was introduced. > > > > Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, > > the 2nd bind() fails properly because we fall back to using bhash to > > detect conflicts for the v4-mapped-v6 address. > > I think we have one more issue here: > > socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3 > socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 4 > bind(3, {sa_family=AF_INET6, sin6_port=htons(9999), > sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", > &sin6_addr), sin6_scope_id=0}, 28) = 0 > bind(4, {sa_family=AF_INET, sin_port=htons(9999), > sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > > I think the second bind has to return EADDRINUSE, doesn't it? Correct, and the following patch fixes it separately. Sorry, I forgot to add you in CC of the entire series. https://lore.kernel.org/netdev/20230911183700.60878-4-kuniyu@amazon.com/
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 0675be0f3fa0..56d8217ea6cf 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -784,6 +784,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) cpu_to_be32(0x0000ffff))) == 0UL; } +static inline bool ipv6_addr_v4mapped_any(const struct in6_addr *a) +{ + return ipv6_addr_v4mapped(a) && ipv4_is_zeronet(a->s6_addr32[3]); +} + static inline bool ipv6_addr_v4mapped_loopback(const struct in6_addr *a) { return ipv6_addr_v4mapped(a) && ipv4_is_loopback(a->s6_addr32[3]); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 7876b7d703cb..0a9b20eb81c4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -837,7 +837,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const if (sk->sk_family == AF_INET) return net_eq(ib2_net(tb), net) && tb->port == port && tb->l3mdev == l3mdev && - ipv6_addr_any(&tb->v6_rcv_saddr); + (ipv6_addr_any(&tb->v6_rcv_saddr) || + ipv6_addr_v4mapped_any(&tb->v6_rcv_saddr)); return false; }
Andrei Vagin reported bind() regression with strace logs. If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:0.0.0.0', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the v4-mapped-v6 wildcard address. The example above does not work after commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket"), but the blamed change is not the commit. Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated as 0.0.0.0, and the sequence above worked by chance. Technically, this case has been broken since bhash2 was introduced. Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, the 2nd bind() fails properly because we fall back to using bhash to detect conflicts for the v4-mapped-v6 address. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Andrei Vagin <avagin@google.com> Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- include/net/ipv6.h | 5 +++++ net/ipv4/inet_hashtables.c | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-)