diff mbox series

[v1,net,1/5] tcp: Fix bind() regression for v4-mapped-v6 wildcard address.

Message ID 20230911165106.39384-2-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series tcp: Fix bind() regression for v4-mapped-v6 address | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3173 this patch: 3173
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1546 this patch: 1546
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3420 this patch: 3420
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima Sept. 11, 2023, 4:51 p.m. UTC
Andrei Vagin reported bind() regression with strace logs.

If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.

  from socket import *

  s1 = socket(AF_INET6, SOCK_STREAM)
  s1.bind(('::ffff:0.0.0.0', 0))

  s2 = socket(AF_INET, SOCK_STREAM)
  s2.bind(('127.0.0.1', s1.getsockname()[1]))

During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
if tb has the v4-mapped-v6 wildcard address.

The example above does not work after commit 5456262d2baa ("net: Fix
incorrect address comparison when searching for a bind2 bucket"), but
the blamed change is not the commit.

Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
as 0.0.0.0, and the sequence above worked by chance.  Technically, this
case has been broken since bhash2 was introduced.

Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
the 2nd bind() fails properly because we fall back to using bhash to
detect conflicts for the v4-mapped-v6 address.

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Andrei Vagin <avagin@google.com>
Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/ipv6.h         | 5 +++++
 net/ipv4/inet_hashtables.c | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Andrei Vagin Sept. 11, 2023, 8 p.m. UTC | #1
On Mon, Sep 11, 2023 at 9:52 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> Andrei Vagin reported bind() regression with strace logs.
>
> If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
> socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
>
>   from socket import *
>
>   s1 = socket(AF_INET6, SOCK_STREAM)
>   s1.bind(('::ffff:0.0.0.0', 0))
>
>   s2 = socket(AF_INET, SOCK_STREAM)
>   s2.bind(('127.0.0.1', s1.getsockname()[1]))
>
> During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
> AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
> if tb has the v4-mapped-v6 wildcard address.
>
> The example above does not work after commit 5456262d2baa ("net: Fix
> incorrect address comparison when searching for a bind2 bucket"), but
> the blamed change is not the commit.
>
> Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
> as 0.0.0.0, and the sequence above worked by chance.  Technically, this
> case has been broken since bhash2 was introduced.
>
> Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
> the 2nd bind() fails properly because we fall back to using bhash to
> detect conflicts for the v4-mapped-v6 address.

I think we have one more issue here:

socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 4
bind(3, {sa_family=AF_INET6, sin6_port=htons(9999),
sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1",
&sin6_addr), sin6_scope_id=0}, 28) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(9999),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0

I think the second bind has to return EADDRINUSE, doesn't it?

I don't deep dive to this code, but after a quick look, I think we can
do something like this:

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7876b7d703cb..f7a700d392d0 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -110,8 +110,13 @@ static void inet_bind2_bucket_init(struct
inet_bind2_bucket *tb,
        tb->l3mdev    = l3mdev;
        tb->port      = port;
 #if IS_ENABLED(CONFIG_IPV6)
-       tb->family    = sk->sk_family;
-       if (sk->sk_family == AF_INET6)
+
+       if (sk->sk_family == AF_INET6 &&
+           ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+               tb->family = AF_INET;
+       else
+               tb->family = sk->sk_family;
+       if (tb->family == AF_INET6)
                tb->v6_rcv_saddr = sk->sk_v6_rcv_saddr;
        else
 #endif
@@ -149,10 +154,15 @@ static bool inet_bind2_bucket_addr_match(const
struct inet_bind2_bucket *tb2,
                                         const struct sock *sk)
 {
 #if IS_ENABLED(CONFIG_IPV6)
-       if (sk->sk_family != tb2->family)
+       unsigned short family = sk->sk_family;
+
+       if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+               family = AF_INET;
+
+       if (family != tb2->family)
                return false;

-       if (sk->sk_family == AF_INET6)
+       if (family == AF_INET6)
                return ipv6_addr_equal(&tb2->v6_rcv_saddr,
                                       &sk->sk_v6_rcv_saddr);
 #endif
@@ -816,10 +826,15 @@ static bool inet_bind2_bucket_match(const struct
inet_bind2_bucket *tb,
                                    int l3mdev, const struct sock *sk)
 {
 #if IS_ENABLED(CONFIG_IPV6)
-       if (sk->sk_family != tb->family)
+       unsigned short family = sk->sk_family;
+
+       if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+               family = AF_INET;
+
+       if (family != tb->family)
                return false;

-       if (sk->sk_family == AF_INET6)
+       if (family == AF_INET6)
                return net_eq(ib2_net(tb), net) && tb->port == port &&
                        tb->l3mdev == l3mdev &&
                        ipv6_addr_equal(&tb->v6_rcv_saddr,
&sk->sk_v6_rcv_saddr);



>
> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
> Reported-by: Andrei Vagin <avagin@google.com>
> Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  include/net/ipv6.h         | 5 +++++
>  net/ipv4/inet_hashtables.c | 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 0675be0f3fa0..56d8217ea6cf 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -784,6 +784,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a)
>                                         cpu_to_be32(0x0000ffff))) == 0UL;
>  }
>
> +static inline bool ipv6_addr_v4mapped_any(const struct in6_addr *a)
> +{
> +       return ipv6_addr_v4mapped(a) && ipv4_is_zeronet(a->s6_addr32[3]);
> +}
> +
>  static inline bool ipv6_addr_v4mapped_loopback(const struct in6_addr *a)
>  {
>         return ipv6_addr_v4mapped(a) && ipv4_is_loopback(a->s6_addr32[3]);
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index 7876b7d703cb..0a9b20eb81c4 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -837,7 +837,8 @@ bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const
>                 if (sk->sk_family == AF_INET)
>                         return net_eq(ib2_net(tb), net) && tb->port == port &&
>                                 tb->l3mdev == l3mdev &&
> -                               ipv6_addr_any(&tb->v6_rcv_saddr);
> +                               (ipv6_addr_any(&tb->v6_rcv_saddr) ||
> +                                ipv6_addr_v4mapped_any(&tb->v6_rcv_saddr));
>
>                 return false;
>         }
> --
> 2.30.2
>
Kuniyuki Iwashima Sept. 11, 2023, 8:05 p.m. UTC | #2
From: Andrei Vagin <avagin@google.com>
Date: Mon, 11 Sep 2023 13:00:19 -0700
> On Mon, Sep 11, 2023 at 9:52 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > Andrei Vagin reported bind() regression with strace logs.
> >
> > If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
> > socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
> >
> >   from socket import *
> >
> >   s1 = socket(AF_INET6, SOCK_STREAM)
> >   s1.bind(('::ffff:0.0.0.0', 0))
> >
> >   s2 = socket(AF_INET, SOCK_STREAM)
> >   s2.bind(('127.0.0.1', s1.getsockname()[1]))
> >
> > During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
> > AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
> > if tb has the v4-mapped-v6 wildcard address.
> >
> > The example above does not work after commit 5456262d2baa ("net: Fix
> > incorrect address comparison when searching for a bind2 bucket"), but
> > the blamed change is not the commit.
> >
> > Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
> > as 0.0.0.0, and the sequence above worked by chance.  Technically, this
> > case has been broken since bhash2 was introduced.
> >
> > Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
> > the 2nd bind() fails properly because we fall back to using bhash to
> > detect conflicts for the v4-mapped-v6 address.
> 
> I think we have one more issue here:
> 
> socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
> socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 4
> bind(3, {sa_family=AF_INET6, sin6_port=htons(9999),
> sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1",
> &sin6_addr), sin6_scope_id=0}, 28) = 0
> bind(4, {sa_family=AF_INET, sin_port=htons(9999),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> 
> I think the second bind has to return EADDRINUSE, doesn't it?

Correct, and the following patch fixes it separately.
Sorry, I forgot to add you in CC of the entire series.
https://lore.kernel.org/netdev/20230911183700.60878-4-kuniyu@amazon.com/
diff mbox series

Patch

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 0675be0f3fa0..56d8217ea6cf 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -784,6 +784,11 @@  static inline bool ipv6_addr_v4mapped(const struct in6_addr *a)
 					cpu_to_be32(0x0000ffff))) == 0UL;
 }
 
+static inline bool ipv6_addr_v4mapped_any(const struct in6_addr *a)
+{
+	return ipv6_addr_v4mapped(a) && ipv4_is_zeronet(a->s6_addr32[3]);
+}
+
 static inline bool ipv6_addr_v4mapped_loopback(const struct in6_addr *a)
 {
 	return ipv6_addr_v4mapped(a) && ipv4_is_loopback(a->s6_addr32[3]);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7876b7d703cb..0a9b20eb81c4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -837,7 +837,8 @@  bool inet_bind2_bucket_match_addr_any(const struct inet_bind2_bucket *tb, const
 		if (sk->sk_family == AF_INET)
 			return net_eq(ib2_net(tb), net) && tb->port == port &&
 				tb->l3mdev == l3mdev &&
-				ipv6_addr_any(&tb->v6_rcv_saddr);
+				(ipv6_addr_any(&tb->v6_rcv_saddr) ||
+				 ipv6_addr_v4mapped_any(&tb->v6_rcv_saddr));
 
 		return false;
 	}