diff mbox series

[net] ipv4: Fix tos mask in inet_rtm_getroute()

Message ID b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net] ipv4: Fix tos mask in inet_rtm_getroute() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Guillaume Nault Nov. 26, 2020, 6:09 p.m. UTC
When inet_rtm_getroute() was converted to use the RCU variants of
ip_route_input() and ip_route_output_key(), the TOS parameters
stopped being masked with IPTOS_RT_MASK before doing the route lookup.

As a result, "ip route get" can return a different route than what
would be used when sending real packets.

For example:

    $ ip route add 192.0.2.11/32 dev eth0
    $ ip route add unreachable 192.0.2.11/32 tos 2
    $ ip route get 192.0.2.11 tos 2
    RTNETLINK answers: No route to host

But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
actually be routed using the first route:

    $ ping -c 1 -Q 2 192.0.2.11
    PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
    64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms

    --- 192.0.2.11 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms

This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
return results consistent with real route lookups.

Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 net/ipv4/route.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

David Ahern Nov. 28, 2020, 5:03 p.m. UTC | #1
On 11/26/20 11:09 AM, Guillaume Nault wrote:
> When inet_rtm_getroute() was converted to use the RCU variants of
> ip_route_input() and ip_route_output_key(), the TOS parameters
> stopped being masked with IPTOS_RT_MASK before doing the route lookup.
> 
> As a result, "ip route get" can return a different route than what
> would be used when sending real packets.
> 
> For example:
> 
>     $ ip route add 192.0.2.11/32 dev eth0
>     $ ip route add unreachable 192.0.2.11/32 tos 2
>     $ ip route get 192.0.2.11 tos 2
>     RTNETLINK answers: No route to host
> 
> But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
> actually be routed using the first route:
> 
>     $ ping -c 1 -Q 2 192.0.2.11
>     PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
>     64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
> 
>     --- 192.0.2.11 ping statistics ---
>     1 packets transmitted, 1 received, 0% packet loss, time 0ms
>     rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
> 
> This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
> return results consistent with real route lookups.
> 
> Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>  net/ipv4/route.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>
Jakub Kicinski Nov. 28, 2020, 9:17 p.m. UTC | #2
On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote:
> On 11/26/20 11:09 AM, Guillaume Nault wrote:
> > When inet_rtm_getroute() was converted to use the RCU variants of
> > ip_route_input() and ip_route_output_key(), the TOS parameters
> > stopped being masked with IPTOS_RT_MASK before doing the route lookup.
> > 
> > As a result, "ip route get" can return a different route than what
> > would be used when sending real packets.
> > 
> > For example:
> > 
> >     $ ip route add 192.0.2.11/32 dev eth0
> >     $ ip route add unreachable 192.0.2.11/32 tos 2
> >     $ ip route get 192.0.2.11 tos 2
> >     RTNETLINK answers: No route to host
> > 
> > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
> > actually be routed using the first route:
> > 
> >     $ ping -c 1 -Q 2 192.0.2.11
> >     PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
> >     64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
> > 
> >     --- 192.0.2.11 ping statistics ---
> >     1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >     rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
> > 
> > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
> > return results consistent with real route lookups.
> > 
> > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> 
> Reviewed-by: David Ahern <dsahern@kernel.org>

Applied, thanks!

Should the discrepancy between the behavior of ip_route_input_rcu() and
ip_route_input() be addressed, possibly?
Guillaume Nault Nov. 29, 2020, 12:54 p.m. UTC | #3
On Sat, Nov 28, 2020 at 01:17:16PM -0800, Jakub Kicinski wrote:
> On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote:
> > On 11/26/20 11:09 AM, Guillaume Nault wrote:
> > > When inet_rtm_getroute() was converted to use the RCU variants of
> > > ip_route_input() and ip_route_output_key(), the TOS parameters
> > > stopped being masked with IPTOS_RT_MASK before doing the route lookup.
> > > 
> > > As a result, "ip route get" can return a different route than what
> > > would be used when sending real packets.
> > > 
> > > For example:
> > > 
> > >     $ ip route add 192.0.2.11/32 dev eth0
> > >     $ ip route add unreachable 192.0.2.11/32 tos 2
> > >     $ ip route get 192.0.2.11 tos 2
> > >     RTNETLINK answers: No route to host
> > > 
> > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
> > > actually be routed using the first route:
> > > 
> > >     $ ping -c 1 -Q 2 192.0.2.11
> > >     PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
> > >     64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
> > > 
> > >     --- 192.0.2.11 ping statistics ---
> > >     1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > >     rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
> > > 
> > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
> > > return results consistent with real route lookups.
> > > 
> > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
> > > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > 
> > Reviewed-by: David Ahern <dsahern@kernel.org>
> 
> Applied, thanks!
> 
> Should the discrepancy between the behavior of ip_route_input_rcu() and
> ip_route_input() be addressed, possibly?

Do you mean masking TOS with IPTOS_RT_MASK directly in
ip_route_input_rcu(), instead of in the callers?

After this patch, all callers apply IPTOS_RT_MASK before calling
ip_route_input_rcu(). So, yes, that could be easily consolidated there,
and I'll do that after net merges into net-next.

More generally, my long term plan is indeed to do mask the TOS in
central places, to get consistent behaviour across the networking
stack. However, generally speaking, I need to be careful not to break
any established behaviour.

I'm mostly worried about the ECN bits. I guess that any caller that
doesn't mask these bits has a bug (as that may break ECN, which is
there since a long time). However, there are many code paths to audit
before we can be sure.

The end goal is to fully support DSCP. Once we'll be sure that no
code path can possibly intreprete an ECN bit as TOS, we'll can safely
drop all those obsolete TOS* masks and macros from the kernel code and
simply mask out the ECN bits (thus preserving the whole DSCP space).

Please note that this is background work for me. Expect slow (but
hopefully regular) progress from me.
Jakub Kicinski Nov. 30, 2020, 4:51 p.m. UTC | #4
On Sun, 29 Nov 2020 13:54:16 +0100 Guillaume Nault wrote:
> On Sat, Nov 28, 2020 at 01:17:16PM -0800, Jakub Kicinski wrote:
> > On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote:  
> > > On 11/26/20 11:09 AM, Guillaume Nault wrote:  
> > > > When inet_rtm_getroute() was converted to use the RCU variants of
> > > > ip_route_input() and ip_route_output_key(), the TOS parameters
> > > > stopped being masked with IPTOS_RT_MASK before doing the route lookup.
> > > > 
> > > > As a result, "ip route get" can return a different route than what
> > > > would be used when sending real packets.
> > > > 
> > > > For example:
> > > > 
> > > >     $ ip route add 192.0.2.11/32 dev eth0
> > > >     $ ip route add unreachable 192.0.2.11/32 tos 2
> > > >     $ ip route get 192.0.2.11 tos 2
> > > >     RTNETLINK answers: No route to host
> > > > 
> > > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
> > > > actually be routed using the first route:
> > > > 
> > > >     $ ping -c 1 -Q 2 192.0.2.11
> > > >     PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
> > > >     64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
> > > > 
> > > >     --- 192.0.2.11 ping statistics ---
> > > >     1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > >     rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
> > > > 
> > > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
> > > > return results consistent with real route lookups.
> > > > 
> > > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
> > > > Signed-off-by: Guillaume Nault <gnault@redhat.com>  
> > > 
> > > Reviewed-by: David Ahern <dsahern@kernel.org>  
> > 
> > Applied, thanks!
> > 
> > Should the discrepancy between the behavior of ip_route_input_rcu() and
> > ip_route_input() be addressed, possibly?  
> 
> Do you mean masking TOS with IPTOS_RT_MASK directly in
> ip_route_input_rcu(), instead of in the callers?
> 
> After this patch, all callers apply IPTOS_RT_MASK before calling
> ip_route_input_rcu(). So, yes, that could be easily consolidated there,
> and I'll do that after net merges into net-next.
> 
> More generally, my long term plan is indeed to do mask the TOS in
> central places, to get consistent behaviour across the networking
> stack. However, generally speaking, I need to be careful not to break
> any established behaviour.
> 
> I'm mostly worried about the ECN bits. I guess that any caller that
> doesn't mask these bits has a bug (as that may break ECN, which is
> there since a long time). However, there are many code paths to audit
> before we can be sure.
> 
> The end goal is to fully support DSCP. Once we'll be sure that no
> code path can possibly intreprete an ECN bit as TOS, we'll can safely
> drop all those obsolete TOS* masks and macros from the kernel code and
> simply mask out the ECN bits (thus preserving the whole DSCP space).

Sounds great!

> Please note that this is background work for me. Expect slow (but
> hopefully regular) progress from me.
diff mbox series

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dc2a399cd9f4..9f43abeac3a8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3222,7 +3222,7 @@  static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 
 	fl4.daddr = dst;
 	fl4.saddr = src;
-	fl4.flowi4_tos = rtm->rtm_tos;
+	fl4.flowi4_tos = rtm->rtm_tos & IPTOS_RT_MASK;
 	fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = uid;
@@ -3246,8 +3246,9 @@  static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		fl4.flowi4_iif = iif; /* for rt_fill_info */
 		skb->dev	= dev;
 		skb->mark	= mark;
-		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
-					 dev, &res);
+		err = ip_route_input_rcu(skb, dst, src,
+					 rtm->rtm_tos & IPTOS_RT_MASK, dev,
+					 &res);
 
 		rt = skb_rtable(skb);
 		if (err == 0 && rt->dst.error)