diff mbox series

[net,v2] ipv6: socket SO_BINDTODEVICE lookup routing fail without IPv6 rule.

Message ID 20250103054413.31581-1-shiming.cheng@mediatek.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] ipv6: socket SO_BINDTODEVICE lookup routing fail without IPv6 rule. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers fail 1 blamed authors not CCed: hannes@stressinduktion.org; 1 maintainers not CCed: hannes@stressinduktion.org
netdev/build_clang success Errors and warnings before: 700 this patch: 700
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 329 this patch: 329
netdev/checkpatch warning CHECK: Alignment should match open parenthesis WARNING: line length of 86 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-01-03--09-00 (tests: 881)

Commit Message

Shiming Cheng (成诗明) Jan. 3, 2025, 5:43 a.m. UTC
When using socket IPv6 with SO_BINDTODEVICE, if IPv6 rule is not
matched, it will return ENETUNREACH. In fact, IPv4 does not behave
this way. IPv4 prioritizes looking up IP rules for routing and
forwarding, if not matched it will use socket-bound out interface
to send packets. The modification here is to make IPv6 behave the
same as IPv4. If IP rule is not found, it will also use socket-bound
out interface to send packts.

Fixes: 6f21c96a78b8 ("ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
---
 include/net/ip6_route.h |  2 ++
 net/ipv6/ip6_output.c   |  7 ++++++-
 net/ipv6/route.c        | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

Comments

Shiming Cheng (成诗明) Jan. 3, 2025, 10:27 a.m. UTC | #1
Hi David

Test cases will be provided later,  below are the corresponding IP rule
configurations for IPv4 and IPv6 that i provided, as well as the
differences in ping results, the IPv4 result passed, but the IPv6
result failed, after adding this patch, the IPv6 result passed.

Thanks

bind out interface ccmni2 address:
ccmni2    Link encap:UNSPEC
          inet addr:10.233.33.169  Mask:255.0.0.0
          inet6 addr: fe80::1817:1da4:19a1:dc42/64 Scope: Link
          inet6 addr: 2409:8900:26f3:2541:1817:1da4:19a1:dc42/64 Scope:

1.There is no ip rule for ccmni2 and also pass.
k6991v1_64:/ # ip rule list
0:      from all lookup local
10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
11000:  from all iif lo oif dummy0 uidrange 0-0 lookup dummy0
11000:  from all iif lo oif ccmni1 uidrange 0-0 lookup ccmni1
16000:  from all fwmark 0x10063/0x1ffff iif lo lookup local_network
16000:  from all fwmark 0xd0071/0xdffff iif lo lookup ccmni1
17000:  from all iif lo oif dummy0 lookup dummy0
17000:  from all fwmark 0xc0000/0xc0000 iif lo oif ccmni1 lookup ccmni1
18000:  from all fwmark 0x0/0x10000 lookup legacy_system
19000:  from all fwmark 0x0/0x10000 lookup legacy_network
20000:  from all fwmark 0x0/0x10000 lookup local_network
32000:  from all unreachable
k6991v1_64:/ #
k6991v1_64:/ #
k6991v1_64:/ #
k6991v1_64:/ # ping -I ccmni2 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 10.233.33.169 ccmni2: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=50 time=179 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=50 time=74.3 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=50 time=72.7 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=50 time=67.1 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=50 time=63.3 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=50 time=90.8 ms
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5017ms
rtt min/avg/max/mdev = 63.323/91.253/179.082/40.213 ms



2.There is no ipv6 rule for ccmni2 and fail.
k6991v1_64:/ # ip -6 rule list
0:      from all lookup local
10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
11000:  from all iif lo oif dummy0 uidrange 0-0 lookup dummy0
11000:  from all iif lo oif ccmni1 uidrange 0-0 lookup ccmni1
16000:  from all fwmark 0x10063/0x1ffff iif lo lookup local_network
16000:  from all fwmark 0xd0071/0xdffff iif lo lookup ccmni1
17000:  from all iif lo oif dummy0 lookup dummy0
17000:  from all fwmark 0xc0000/0xc0000 iif lo oif ccmni1 lookup ccmni1
18000:  from all fwmark 0x0/0x10000 lookup legacy_system
19000:  from all fwmark 0x0/0x10000 lookup legacy_network
20000:  from all fwmark 0x0/0x10000 lookup local_network
32000:  from all unreachable
k6991v1_64:/ # ping6 -I ccmni2 2001:4860:4860::8888
connect: Network is unreachable


3.After merging the submitted patch.
There is no ipv6 rule for ccmni2 and pass.

ccmni2    Link encap:UNSPEC
          inet addr:10.185.46.236  Mask:255.0.0.0
          inet6 addr: 2409:8900:24d9:404e:1817:25c4:e1c7:4e52/64 Scope:

k6897v1_64:/ # ip -6 rule list
0:      from all lookup local
10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
11000:  from all iif lo oif dummy0 uidrange 0-0 lookup dummy0
11000:  from all iif lo oif ccmni1 uidrange 0-0 lookup ccmni1
16000:  from all fwmark 0x10063/0x1ffff iif lo lookup local_network
16000:  from all fwmark 0xd0070/0xdffff iif lo lookup ccmni1
17000:  from all iif lo oif dummy0 lookup dummy0
17000:  from all fwmark 0xc0000/0xc0000 iif lo oif ccmni1 lookup ccmni1
18000:  from all fwmark 0x0/0x10000 lookup legacy_system
19000:  from all fwmark 0x0/0x10000 lookup legacy_network
20000:  from all fwmark 0x0/0x10000 lookup local_network
32000:  from all unreachable
k6897v1_64:/ # ping6 -I ccmni2 2001:4860:4860::8888
PING 2001:4860:4860::8888(2001:4860:4860::8888) from
2409:8900:24d9:404e:1817:25c4:e1c7:4e52 ccmni2: 56 data bytes
64 bytes from 2001:4860:4860::8888: icmp_seq=1 ttl=51 time=167 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=2 ttl=51 time=73.9 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=3 ttl=51 time=101 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=4 ttl=51 time=62.3 ms
^C
--- 2001:4860:4860::8888 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 4011ms


On Fri, 2025-01-03 at 13:43 +0800, Shiming Cheng wrote:
> When using socket IPv6 with SO_BINDTODEVICE, if IPv6 rule is not
> matched, it will return ENETUNREACH. In fact, IPv4 does not behave
> this way. IPv4 prioritizes looking up IP rules for routing and
> forwarding, if not matched it will use socket-bound out interface
> to send packets. The modification here is to make IPv6 behave the
> same as IPv4. If IP rule is not found, it will also use socket-bound
> out interface to send packts.
> 
> Fixes: 6f21c96a78b8 ("ipv6: enforce flowi6_oif usage in
> ip6_dst_lookup_tail()")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> ---
>  include/net/ip6_route.h |  2 ++
>  net/ipv6/ip6_output.c   |  7 ++++++-
>  net/ipv6/route.c        | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 6dbdf60b342f..0625597def6f 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -214,6 +214,8 @@ void rt6_multipath_rebalance(struct fib6_info
> *f6i);
>  
>  void rt6_uncached_list_add(struct rt6_info *rt);
>  void rt6_uncached_list_del(struct rt6_info *rt);
> +struct rt6_info *ip6_create_rt_oif_rcu(struct net *net, const struct
> sock *sk,
> +		struct flowi6 *fl6, int flags);
>  
>  static inline const struct rt6_info *skb_rt6_info(const struct
> sk_buff *skb)
>  {
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index f7b4608bb316..95728c8921cb 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1156,8 +1156,13 @@ static int ip6_dst_lookup_tail(struct net
> *net, const struct sock *sk,
>  		*dst = ip6_route_output_flags(net, sk, fl6, flags);
>  
>  	err = (*dst)->error;
> -	if (err)
> +	if (err && (flags & RT6_LOOKUP_F_IFACE)) {
> +		*dst = (struct dst_entry *)ip6_create_rt_oif_rcu(net,
> sk, fl6, flags);
> +		if (!*dst)
> +			goto out_err_release;
> +	} else if (err) {
>  		goto out_err_release;
> +	}
>  
>  #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
>  	/*
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 67ff16c04718..7d7450fab44f 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1214,6 +1214,40 @@ static struct rt6_info
> *ip6_create_rt_rcu(const struct fib6_result *res)
>  	return nrt;
>  }
>  
> +struct rt6_info *ip6_create_rt_oif_rcu(struct net *net, const struct
> sock *sk,
> +				       struct flowi6 *fl6, int flags)
> +{
> +	struct rt6_info *rt;
> +	unsigned int prefs;
> +	int err;
> +	struct net_device *dev = dev_get_by_index_rcu(net, fl6-
> >flowi6_oif);
> +
> +	if (!dev)
> +		return NULL;
> +	rt = ip6_dst_alloc(dev_net(dev), dev, flags);
> +
> +	if (!rt)
> +		return NULL;
> +	rt->dst.error = 0;
> +	rt->dst.output = ip6_output;
> +	rt->dst.lastuse = jiffies;
> +	prefs = sk ? inet6_sk(sk)->srcprefs : 0;
> +	err = ipv6_dev_get_saddr(net, dev, &fl6->daddr, prefs, &fl6-
> >saddr);
> +
> +	if (err) {
> +		dst_release(&rt->dst);
> +		return NULL;
> +	}
> +	rt->rt6i_dst.addr = fl6->daddr;
> +	rt->rt6i_dst.plen = 128;
> +	rt->rt6i_src.addr = fl6->saddr;
> +	rt->rt6i_dst.plen = 128;
> +	rt->rt6i_idev = in6_dev_get(dev);
> +	rt->rt6i_flags = flags;
> +	return rt;
> +}
> +EXPORT_SYMBOL_GPL(ip6_create_rt_oif_rcu);
> +
>  INDIRECT_CALLABLE_SCOPE struct rt6_info *ip6_pol_route_lookup(struct
> net *net,
>  					     struct fib6_table *table,
>  					     struct flowi6 *fl6,
David Ahern Jan. 3, 2025, 3:31 p.m. UTC | #2
On 1/3/25 3:27 AM, Shiming Cheng (成诗明) wrote:
> Test cases will be provided later, below are the corresponding IP rule
> configurations for IPv4 and IPv6 that i provided, as well as the
> differences in ping results, the IPv4 result passed, but the IPv6 result
> failed, after adding this patch, the IPv6 result passed.

I do not want the output of a complicated stack of ip rules with a ping
a command.

Provide a simplistic set of commands that configure the stack and show
what you believe is a problem. Anyone on this list should be able to
quickly reproduce the setup to verify it is a problem and investigate
what is happening.
Jakub Kicinski Jan. 4, 2025, 2:14 a.m. UTC | #3
On Fri, 3 Jan 2025 13:43:49 +0800 Shiming Cheng wrote:
> When using socket IPv6 with SO_BINDTODEVICE, if IPv6 rule is not
> matched, it will return ENETUNREACH. In fact, IPv4 does not behave
> this way. IPv4 prioritizes looking up IP rules for routing and
> forwarding, if not matched it will use socket-bound out interface
> to send packets. The modification here is to make IPv6 behave the
> same as IPv4. If IP rule is not found, it will also use socket-bound
> out interface to send packts.

CI shows failures in tools/testing/selftests/net/fcnal-test.sh
and tools/testing/selftests/net/fib_nexthops.sh with this patch
applied. Could be a flake but please double check before sending v3
diff mbox series

Patch

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6dbdf60b342f..0625597def6f 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -214,6 +214,8 @@  void rt6_multipath_rebalance(struct fib6_info *f6i);
 
 void rt6_uncached_list_add(struct rt6_info *rt);
 void rt6_uncached_list_del(struct rt6_info *rt);
+struct rt6_info *ip6_create_rt_oif_rcu(struct net *net, const struct sock *sk,
+		struct flowi6 *fl6, int flags);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7b4608bb316..95728c8921cb 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1156,8 +1156,13 @@  static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
 		*dst = ip6_route_output_flags(net, sk, fl6, flags);
 
 	err = (*dst)->error;
-	if (err)
+	if (err && (flags & RT6_LOOKUP_F_IFACE)) {
+		*dst = (struct dst_entry *)ip6_create_rt_oif_rcu(net, sk, fl6, flags);
+		if (!*dst)
+			goto out_err_release;
+	} else if (err) {
 		goto out_err_release;
+	}
 
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 	/*
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 67ff16c04718..7d7450fab44f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1214,6 +1214,40 @@  static struct rt6_info *ip6_create_rt_rcu(const struct fib6_result *res)
 	return nrt;
 }
 
+struct rt6_info *ip6_create_rt_oif_rcu(struct net *net, const struct sock *sk,
+				       struct flowi6 *fl6, int flags)
+{
+	struct rt6_info *rt;
+	unsigned int prefs;
+	int err;
+	struct net_device *dev = dev_get_by_index_rcu(net, fl6->flowi6_oif);
+
+	if (!dev)
+		return NULL;
+	rt = ip6_dst_alloc(dev_net(dev), dev, flags);
+
+	if (!rt)
+		return NULL;
+	rt->dst.error = 0;
+	rt->dst.output = ip6_output;
+	rt->dst.lastuse = jiffies;
+	prefs = sk ? inet6_sk(sk)->srcprefs : 0;
+	err = ipv6_dev_get_saddr(net, dev, &fl6->daddr, prefs, &fl6->saddr);
+
+	if (err) {
+		dst_release(&rt->dst);
+		return NULL;
+	}
+	rt->rt6i_dst.addr = fl6->daddr;
+	rt->rt6i_dst.plen = 128;
+	rt->rt6i_src.addr = fl6->saddr;
+	rt->rt6i_dst.plen = 128;
+	rt->rt6i_idev = in6_dev_get(dev);
+	rt->rt6i_flags = flags;
+	return rt;
+}
+EXPORT_SYMBOL_GPL(ip6_create_rt_oif_rcu);
+
 INDIRECT_CALLABLE_SCOPE struct rt6_info *ip6_pol_route_lookup(struct net *net,
 					     struct fib6_table *table,
 					     struct flowi6 *fl6,