Message ID | 20230830061550.2319741-1-liuhangbin@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] ipv6: do not merge differe type and protocol routes | expand |
On 8/30/23 12:15 AM, Hangbin Liu wrote: > Different with IPv4, IPv6 will auto merge the same metric routes into > multipath routes. But the different type and protocol routes are also > merged, which will lost user's configure info. e.g. > > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 > + ip -6 route show table 100 > local 2001:db8:103::/64 metric 1024 pref medium > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 > + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 > + ip -6 route show table 200 > 2001:db8:104::/64 proto kernel metric 1024 pref medium > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > So let's skip counting the different type and protocol routes as siblings. > After update, the different type/protocol routes will not be merged. > > + ip -6 route show table 100 > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > > + ip -6 route show table 200 > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium > > Reported-by: Thomas Haller <thaller@redhat.com> > Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994 > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- > All fib test passed: > Tests passed: 203 > Tests failed: 0 Please add the above tests for this case to the fib tests script. > --- > net/ipv6/ip6_fib.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 28b01a068412..f60f5d14f034 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1133,6 +1133,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > rt->fib6_pmtu); > return -EEXIST; > } > + > + if (iter->fib6_type != rt->fib6_type || > + iter->fib6_protocol != rt->fib6_protocol) > + goto next_iter; > + > /* If we have the same destination and the same metric, > * but not the same gateway, then the route we try to > * add is sibling to this route, increment our counter Reviewed-by: David Ahern <dsahern@kernel.org>
Le 30/08/2023 à 08:15, Hangbin Liu a écrit : > Different with IPv4, IPv6 will auto merge the same metric routes into > multipath routes. But the different type and protocol routes are also > merged, which will lost user's configure info. e.g. > > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 > + ip -6 route show table 100 > local 2001:db8:103::/64 metric 1024 pref medium > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 > + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 > + ip -6 route show table 200 > 2001:db8:104::/64 proto kernel metric 1024 pref medium > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > So let's skip counting the different type and protocol routes as siblings. > After update, the different type/protocol routes will not be merged. > > + ip -6 route show table 100 > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > > + ip -6 route show table 200 > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium This seems wrong. The goal of 'ip route append' is to add a next hop, not to create a new route. Ok, it adds a new route if no route exists, but it seems wrong to me to use it by default, instead of 'add', to make things work magically. It seems more correct to return an error in these cases, but this will change the uapi and it may break existing setups. Before this patch, both next hops could be used by the kernel. After it, one route will be ignored (the former or the last one?). This is confusing and also seems wrong. > > Reported-by: Thomas Haller <thaller@redhat.com> > Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994 Please, don't put private link. The bug entry is not public. Can you explain what is the initial problem? > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- > All fib test passed: > Tests passed: 203 > Tests failed: 0 > --- > net/ipv6/ip6_fib.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 28b01a068412..f60f5d14f034 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1133,6 +1133,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > rt->fib6_pmtu); > return -EEXIST; > } > + > + if (iter->fib6_type != rt->fib6_type || > + iter->fib6_protocol != rt->fib6_protocol) > + goto next_iter; > + > /* If we have the same destination and the same metric, > * but not the same gateway, then the route we try to > * add is sibling to this route, increment our counter
On 8/30/23 9:29 AM, Nicolas Dichtel wrote: > Le 30/08/2023 à 08:15, Hangbin Liu a écrit : >> Different with IPv4, IPv6 will auto merge the same metric routes into >> multipath routes. But the different type and protocol routes are also >> merged, which will lost user's configure info. e.g. >> >> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 >> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 >> + ip -6 route show table 100 >> local 2001:db8:103::/64 metric 1024 pref medium >> nexthop via 2001:db8:101::10 dev dummy1 weight 1 >> nexthop via 2001:db8:101::10 dev dummy2 weight 1 >> >> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 >> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 >> + ip -6 route show table 200 >> 2001:db8:104::/64 proto kernel metric 1024 pref medium >> nexthop via 2001:db8:101::10 dev dummy1 weight 1 >> nexthop via 2001:db8:101::10 dev dummy2 weight 1 >> >> So let's skip counting the different type and protocol routes as siblings. >> After update, the different type/protocol routes will not be merged. >> >> + ip -6 route show table 100 >> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >> >> + ip -6 route show table 200 >> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium >> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium > > This seems wrong. The goal of 'ip route append' is to add a next hop, not to > create a new route. Ok, it adds a new route if no route exists, but it seems > wrong to me to use it by default, instead of 'add', to make things work magically. Legacy API; nothing can be done about that (ie., that append makes a new route when none exists). > > It seems more correct to return an error in these cases, but this will change > the uapi and it may break existing setups. > > Before this patch, both next hops could be used by the kernel. After it, one > route will be ignored (the former or the last one?). This is confusing and also > seems wrong. Append should match all details of a route to add to an existing entry and make it multipath. If there is a difference (especially the type - protocol difference is arguable) in attributes, then they are different routes.
On Wed, Aug 30, 2023 at 08:49:18AM -0600, David Ahern wrote: > On 8/30/23 12:15 AM, Hangbin Liu wrote: > > Different with IPv4, IPv6 will auto merge the same metric routes into > > multipath routes. But the different type and protocol routes are also > > merged, which will lost user's configure info. e.g. > > > > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 > > + ip -6 route show table 100 > > local 2001:db8:103::/64 metric 1024 pref medium > > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > > > + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 > > + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 > > + ip -6 route show table 200 > > 2001:db8:104::/64 proto kernel metric 1024 pref medium > > nexthop via 2001:db8:101::10 dev dummy1 weight 1 > > nexthop via 2001:db8:101::10 dev dummy2 weight 1 > > > > So let's skip counting the different type and protocol routes as siblings. > > After update, the different type/protocol routes will not be merged. > > > > + ip -6 route show table 100 > > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > > > > + ip -6 route show table 200 > > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium > > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium > > > > Reported-by: Thomas Haller <thaller@redhat.com> > > Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994 > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > > --- > > All fib test passed: > > Tests passed: 203 > > Tests failed: 0 > > Please add the above tests for this case to the fib tests script. OK, I will add them and re-post after net-next open. Thanks Hangbin
Le 30/08/2023 à 20:57, David Ahern a écrit : > On 8/30/23 9:29 AM, Nicolas Dichtel wrote: >> Le 30/08/2023 à 08:15, Hangbin Liu a écrit : >>> Different with IPv4, IPv6 will auto merge the same metric routes into >>> multipath routes. But the different type and protocol routes are also >>> merged, which will lost user's configure info. e.g. >>> >>> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 >>> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 >>> + ip -6 route show table 100 >>> local 2001:db8:103::/64 metric 1024 pref medium >>> nexthop via 2001:db8:101::10 dev dummy1 weight 1 >>> nexthop via 2001:db8:101::10 dev dummy2 weight 1 >>> >>> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 >>> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 >>> + ip -6 route show table 200 >>> 2001:db8:104::/64 proto kernel metric 1024 pref medium >>> nexthop via 2001:db8:101::10 dev dummy1 weight 1 >>> nexthop via 2001:db8:101::10 dev dummy2 weight 1 >>> >>> So let's skip counting the different type and protocol routes as siblings. >>> After update, the different type/protocol routes will not be merged. >>> >>> + ip -6 route show table 100 >>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >>> >>> + ip -6 route show table 200 >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium >> >> This seems wrong. The goal of 'ip route append' is to add a next hop, not to >> create a new route. Ok, it adds a new route if no route exists, but it seems >> wrong to me to use it by default, instead of 'add', to make things work magically. > > Legacy API; nothing can be done about that (ie., that append makes a new > route when none exists). > >> >> It seems more correct to return an error in these cases, but this will change >> the uapi and it may break existing setups. >> >> Before this patch, both next hops could be used by the kernel. After it, one >> route will be ignored (the former or the last one?). This is confusing and also >> seems wrong. > > Append should match all details of a route to add to an existing entry > and make it multipath. If there is a difference (especially the type - > protocol difference is arguable) in attributes, then they are different > routes. > As you said, the protocol difference is arguable. It's not a property of the route, just a hint. I think the 'append' should match a route whatever the protocol is. 'ip route change' for example does not use the protocol to find the existing route, it will update it: $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 $ ip -6 route 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp $ ip -6 route 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel $ ip -6 route 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium Why would 'append' selects route differently? This patch breaks the legacy API.
Hi Nicolas, On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote: > >>> So let's skip counting the different type and protocol routes as siblings. > >>> After update, the different type/protocol routes will not be merged. > >>> > >>> + ip -6 route show table 100 > >>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > >>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > >>> > >>> + ip -6 route show table 200 > >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium > >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium > >> > >> This seems wrong. The goal of 'ip route append' is to add a next hop, not to > >> create a new route. Ok, it adds a new route if no route exists, but it seems > >> wrong to me to use it by default, instead of 'add', to make things work magically. > > > > Legacy API; nothing can be done about that (ie., that append makes a new > > route when none exists). > > > >> > >> It seems more correct to return an error in these cases, but this will change > >> the uapi and it may break existing setups. > >> > >> Before this patch, both next hops could be used by the kernel. After it, one > >> route will be ignored (the former or the last one?). This is confusing and also > >> seems wrong. > > > > Append should match all details of a route to add to an existing entry > > and make it multipath. If there is a difference (especially the type - > > protocol difference is arguable) in attributes, then they are different > > routes. > > > > As you said, the protocol difference is arguable. It's not a property of the > route, just a hint. > I think the 'append' should match a route whatever the protocol is. > 'ip route change' for example does not use the protocol to find the existing > route, it will update it: > > $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 > $ ip -6 route > 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium > $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp > $ ip -6 route > 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium > $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel > $ ip -6 route > 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium Not sure if I understand correctly, `ip route replace` should able to replace all other field other than dest and dev. It's for changing the route, not only nexthop. > > Why would 'append' selects route differently? The append should also works for a single route, not only for append nexthop, no? > > This patch breaks the legacy API. As the patch's description. Who would expect different type/protocol route should be merged as multipath route? I don't think the old API is correct. Thanks Hangbin
Le 31/08/2023 à 12:14, Hangbin Liu a écrit : > Hi Nicolas, > On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote: >>>>> So let's skip counting the different type and protocol routes as siblings. >>>>> After update, the different type/protocol routes will not be merged. >>>>> >>>>> + ip -6 route show table 100 >>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >>>>> >>>>> + ip -6 route show table 200 >>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium >>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium >>>> >>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to >>>> create a new route. Ok, it adds a new route if no route exists, but it seems >>>> wrong to me to use it by default, instead of 'add', to make things work magically. >>> >>> Legacy API; nothing can be done about that (ie., that append makes a new >>> route when none exists). >>> >>>> >>>> It seems more correct to return an error in these cases, but this will change >>>> the uapi and it may break existing setups. >>>> >>>> Before this patch, both next hops could be used by the kernel. After it, one >>>> route will be ignored (the former or the last one?). This is confusing and also >>>> seems wrong. >>> >>> Append should match all details of a route to add to an existing entry >>> and make it multipath. If there is a difference (especially the type - >>> protocol difference is arguable) in attributes, then they are different >>> routes. >>> >> >> As you said, the protocol difference is arguable. It's not a property of the >> route, just a hint. >> I think the 'append' should match a route whatever the protocol is. >> 'ip route change' for example does not use the protocol to find the existing >> route, it will update it: >> >> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 >> $ ip -6 route >> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium >> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp >> $ ip -6 route >> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium >> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel >> $ ip -6 route >> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium > > Not sure if I understand correctly, `ip route replace` should able to > replace all other field other than dest and dev. It's for changing the route, > not only nexthop. >> >> Why would 'append' selects route differently? > > The append should also works for a single route, not only for append nexthop, no? I don't think so. The 'append' should 'join', not add. Adding more cases where a route is added instead of appended doesn't make the API clearer. With this patch, it will be possible to add a new route with the 'append' command when the 'add' command fails: $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 RTNETLINK answers: File exists $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200 $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200 RTNETLINK answers: File exists This makes the API more confusing and complex. And I don't understand how it will be used later. There will be 2 routes on the system, but only one will be used, which one? This is confusing. > >> >> This patch breaks the legacy API. > > As the patch's description. Who would expect different type/protocol route > should be merged as multipath route? I don't think the old API is correct. The question is not 'who expect', but 'is there some systems somewhere that rely on this (deliberately or not)'. Frankly, the protocol is just informative, so I don't see why it is a problem to ignore it with the 'append' command. For the type, it is weird, for sure. Rejecting the command seems better than duplicating routes. Which route is used by the stack? Regards, Nicolas
On 8/31/23 5:58 AM, Nicolas Dichtel wrote: > Le 31/08/2023 à 12:14, Hangbin Liu a écrit : >> Hi Nicolas, >> On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote: >>>>>> So let's skip counting the different type and protocol routes as siblings. >>>>>> After update, the different type/protocol routes will not be merged. >>>>>> >>>>>> + ip -6 route show table 100 >>>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >>>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >>>>>> >>>>>> + ip -6 route show table 200 >>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium >>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium >>>>> >>>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to >>>>> create a new route. Ok, it adds a new route if no route exists, but it seems >>>>> wrong to me to use it by default, instead of 'add', to make things work magically. >>>> >>>> Legacy API; nothing can be done about that (ie., that append makes a new >>>> route when none exists). >>>> >>>>> >>>>> It seems more correct to return an error in these cases, but this will change >>>>> the uapi and it may break existing setups. >>>>> >>>>> Before this patch, both next hops could be used by the kernel. After it, one >>>>> route will be ignored (the former or the last one?). This is confusing and also >>>>> seems wrong. >>>> >>>> Append should match all details of a route to add to an existing entry >>>> and make it multipath. If there is a difference (especially the type - >>>> protocol difference is arguable) in attributes, then they are different >>>> routes. >>>> >>> >>> As you said, the protocol difference is arguable. It's not a property of the >>> route, just a hint. >>> I think the 'append' should match a route whatever the protocol is. >>> 'ip route change' for example does not use the protocol to find the existing >>> route, it will update it: >>> >>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 >>> $ ip -6 route >>> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium >>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp >>> $ ip -6 route >>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium >>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel >>> $ ip -6 route >>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium >> >> Not sure if I understand correctly, `ip route replace` should able to >> replace all other field other than dest and dev. It's for changing the route, >> not only nexthop. >>> >>> Why would 'append' selects route differently? >> >> The append should also works for a single route, not only for append nexthop, no? > I don't think so. The 'append' should 'join', not add. Adding more cases where a > route is added instead of appended doesn't make the API clearer. > > With this patch, it will be possible to add a new route with the 'append' > command when the 'add' command fails: > $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 > $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 > RTNETLINK answers: File exists > > $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200 > $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200 > RTNETLINK answers: File exists > > This makes the API more confusing and complex. And I don't understand how it > will be used later. There will be 2 routes on the system, but only one will be > used, which one? This is confusing. > >> >>> >>> This patch breaks the legacy API. >> >> As the patch's description. Who would expect different type/protocol route >> should be merged as multipath route? I don't think the old API is correct. > The question is not 'who expect', but 'is there some systems somewhere that rely > on this (deliberately or not)'. > Frankly, the protocol is just informative, so I don't see why it is a problem to > ignore it with the 'append' command. > For the type, it is weird, for sure. Rejecting the command seems better than > duplicating routes. Which route is used by the stack? > > Part of my intent with fib_tests.sh was to document the legacy meaning of 'append, prepend, replace, and change' options while also providing a test script to detect changes that cause a regression. I do agree now that protocol is informative (passthrough from the kernel perspective) so not really part of the route. That should be dropped from the patch leaving just a check on rt_type as to whether the routes are different. From there the append, prepend, replace and change semantics should decide what happens (ie., how the route is inserted).
On Thu, Aug 31, 2023 at 01:58:48PM +0200, Nicolas Dichtel wrote: > > The append should also works for a single route, not only for append nexthop, no? > I don't think so. The 'append' should 'join', not add. Adding more cases where a > route is added instead of appended doesn't make the API clearer. > > With this patch, it will be possible to add a new route with the 'append' > command when the 'add' command fails: > $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 > $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 > RTNETLINK answers: File exists > > $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200 > $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200 > RTNETLINK answers: File exists > > This makes the API more confusing and complex. And I don't understand how it > will be used later. There will be 2 routes on the system, but only one will be > used, which one? This is confusing. Just to makeit it clear, the new patch will not add two route with only different type/protocol. Here is the result with my patch. + ip -6 route flush table 300 + ip link add dummy1 up type dummy + ip link add dummy2 up type dummy + ip addr add 2001:db8:101::1/64 dev dummy1 + ip addr add 2001:db8:101::2/64 dev dummy2 + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 RTNETLINK answers: File exists ^^ here the append still failed + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 + ip -6 route show table 100 local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto bgp table 200 RTNETLINK answers: File exists ^^ And here + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 + ip -6 route show table 200 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium Thanks Hangbin
Le 31/08/2023 à 20:27, David Ahern a écrit : > On 8/31/23 5:58 AM, Nicolas Dichtel wrote: >> Le 31/08/2023 à 12:14, Hangbin Liu a écrit : >>> Hi Nicolas, >>> On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote: >>>>>>> So let's skip counting the different type and protocol routes as siblings. >>>>>>> After update, the different type/protocol routes will not be merged. >>>>>>> >>>>>>> + ip -6 route show table 100 >>>>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >>>>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >>>>>>> >>>>>>> + ip -6 route show table 200 >>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium >>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium >>>>>> >>>>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to >>>>>> create a new route. Ok, it adds a new route if no route exists, but it seems >>>>>> wrong to me to use it by default, instead of 'add', to make things work magically. >>>>> >>>>> Legacy API; nothing can be done about that (ie., that append makes a new >>>>> route when none exists). >>>>> >>>>>> >>>>>> It seems more correct to return an error in these cases, but this will change >>>>>> the uapi and it may break existing setups. >>>>>> >>>>>> Before this patch, both next hops could be used by the kernel. After it, one >>>>>> route will be ignored (the former or the last one?). This is confusing and also >>>>>> seems wrong. >>>>> >>>>> Append should match all details of a route to add to an existing entry >>>>> and make it multipath. If there is a difference (especially the type - >>>>> protocol difference is arguable) in attributes, then they are different >>>>> routes. >>>>> >>>> >>>> As you said, the protocol difference is arguable. It's not a property of the >>>> route, just a hint. >>>> I think the 'append' should match a route whatever the protocol is. >>>> 'ip route change' for example does not use the protocol to find the existing >>>> route, it will update it: >>>> >>>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 >>>> $ ip -6 route >>>> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium >>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp >>>> $ ip -6 route >>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium >>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel >>>> $ ip -6 route >>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium >>> >>> Not sure if I understand correctly, `ip route replace` should able to >>> replace all other field other than dest and dev. It's for changing the route, >>> not only nexthop. >>>> >>>> Why would 'append' selects route differently? >>> >>> The append should also works for a single route, not only for append nexthop, no? >> I don't think so. The 'append' should 'join', not add. Adding more cases where a >> route is added instead of appended doesn't make the API clearer. >> >> With this patch, it will be possible to add a new route with the 'append' >> command when the 'add' command fails: >> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 >> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 >> RTNETLINK answers: File exists >> >> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200 >> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200 >> RTNETLINK answers: File exists >> >> This makes the API more confusing and complex. And I don't understand how it >> will be used later. There will be 2 routes on the system, but only one will be >> used, which one? This is confusing. >> >>> >>>> >>>> This patch breaks the legacy API. >>> >>> As the patch's description. Who would expect different type/protocol route >>> should be merged as multipath route? I don't think the old API is correct. >> The question is not 'who expect', but 'is there some systems somewhere that rely >> on this (deliberately or not)'. >> Frankly, the protocol is just informative, so I don't see why it is a problem to >> ignore it with the 'append' command. >> For the type, it is weird, for sure. Rejecting the command seems better than >> duplicating routes. Which route is used by the stack? >> >> > > Part of my intent with fib_tests.sh was to document the legacy meaning > of 'append, prepend, replace, and change' options while also providing a > test script to detect changes that cause a regression. Yes, it's a good idea for sure. > > I do agree now that protocol is informative (passthrough from the kernel > perspective) so not really part of the route. That should be dropped > from the patch leaving just a check on rt_type as to whether the routes > are different. From there the append, prepend, replace and change > semantics should decide what happens (ie., how the route is inserted). Right. What can guide us is the meaning/concept/benefit of having this kind of routing table: local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium I don't understand how this is used/useful. It's why I ask for the use case/goal of this patch. How does the user know which route is used? Regards, Nicolas
Le 01/09/2023 à 05:58, Hangbin Liu a écrit : > On Thu, Aug 31, 2023 at 01:58:48PM +0200, Nicolas Dichtel wrote: >>> The append should also works for a single route, not only for append nexthop, no? >> I don't think so. The 'append' should 'join', not add. Adding more cases where a >> route is added instead of appended doesn't make the API clearer. It was clear to me. I distinguish the route part and the next hop part. >> >> With this patch, it will be possible to add a new route with the 'append' >> command when the 'add' command fails: >> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 >> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200 >> RTNETLINK answers: File exists >> >> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200 >> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200 >> RTNETLINK answers: File exists >> >> This makes the API more confusing and complex. And I don't understand how it >> will be used later. There will be 2 routes on the system, but only one will be >> used, which one? This is confusing. > > Just to makeit it clear, the new patch will not add two route with only > different type/protocol. Here is the result with my patch. > > + ip -6 route flush table 300 > + ip link add dummy1 up type dummy > + ip link add dummy2 up type dummy > + ip addr add 2001:db8:101::1/64 dev dummy1 > + ip addr add 2001:db8:101::2/64 dev dummy2 > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > RTNETLINK answers: File exists > > ^^ here the append still failed And if I play a little bit of the devil's advocate: why is it rejected? It's not the same route, the types differ :) > > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 > + ip -6 route show table 100 > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium What is the purpose of such a routing table? How is the gateway of a local route used by the kernel? Which route will be used when a packet enters the stack? Regards, Nicolas
On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote: > > I do agree now that protocol is informative (passthrough from the kernel > > perspective) so not really part of the route. That should be dropped I'm not sure. Is there any user space route daemon will use this info? e.g. some BGP route daemon? > > from the patch leaving just a check on rt_type as to whether the routes > > are different. From there the append, prepend, replace and change > > semantics should decide what happens (ie., how the route is inserted). > Right. What can guide us is the meaning/concept/benefit of having this kind of > routing table: > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > > I don't understand how this is used/useful. It's why I ask for the use case/goal > of this patch. > How does the user know which route is used? I'm not sure how user will use it. Maybe just block/forward some traffic to local first and remove the local route later to unblock them. IPv4 can also do like this. e.g. + ip link add dummy1 up type dummy + ip link add dummy2 up type dummy + ip addr add 192.168.0.1/24 dev dummy1 + ip addr add 192.168.0.2/24 dev dummy2 + ip route add local 192.168.3.0/24 dev dummy1 table 100 + ip route append unicast 192.168.3.0/24 dev dummy1 table 100 + ip route append unicast 192.168.3.0/24 dev dummy2 table 100 + ip route show table 100 local 192.168.3.0/24 dev dummy1 scope host 192.168.3.0/24 dev dummy1 scope link 192.168.3.0/24 dev dummy2 scope link Thanks Hangbin
On 9/14/23 8:23 PM, Hangbin Liu wrote: > On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote: >>> I do agree now that protocol is informative (passthrough from the kernel >>> perspective) so not really part of the route. That should be dropped > > I'm not sure. Is there any user space route daemon will use this info? e.g. some > BGP route daemon? It is passthrough information for userspace to track who installed the route. It is not part of the route itself, but metadata passed in and returned.
On Fri, Sep 01, 2023 at 11:50:12AM +0200, Nicolas Dichtel wrote: > > + ip -6 route flush table 300 > > + ip link add dummy1 up type dummy > > + ip link add dummy2 up type dummy > > + ip addr add 2001:db8:101::1/64 dev dummy1 > > + ip addr add 2001:db8:101::2/64 dev dummy2 > > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 > > RTNETLINK answers: File exists > > > > ^^ here the append still failed > And if I play a little bit of the devil's advocate: why is it rejected? It's not > the same route, the types differ :) This is actually for compatibility... > > > > > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 > > + ip -6 route show table 100 > > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium > > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium > > What is the purpose of such a routing table? As I replied in last mail. I don't have a clear purpose. A user may use the local route to block traffic temporary. > How is the gateway of a local route used by the kernel? I don't know. IPv6 support this since beginning... > Which route will be used when a packet enters the stack? The first one it find? Thanks Hangbin
On Thu, Sep 14, 2023 at 09:08:18PM -0600, David Ahern wrote: > On 9/14/23 8:23 PM, Hangbin Liu wrote: > > On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote: > >>> I do agree now that protocol is informative (passthrough from the kernel > >>> perspective) so not really part of the route. That should be dropped > > > > I'm not sure. Is there any user space route daemon will use this info? e.g. some > > BGP route daemon? > > It is passthrough information for userspace to track who installed the > route. It is not part of the route itself, but metadata passed in and > returned. Thanks. I will drop this info. Hangbin
Le 15/09/2023 à 05:49, Hangbin Liu a écrit : > On Fri, Sep 01, 2023 at 11:50:12AM +0200, Nicolas Dichtel wrote: [snip] >>> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 >>> + ip -6 route show table 100 >>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium >>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium >> >> What is the purpose of such a routing table? > > As I replied in last mail. I don't have a clear purpose. A user may use > the local route to block traffic temporary. > >> How is the gateway of a local route used by the kernel? > > I don't know. IPv6 support this since beginning... > >> Which route will be used when a packet enters the stack? > > The first one it find? Frankly, it seems strange to me to try to fix a scenario without a clear view of what is expected. I agree that the current behavior is questionable, but it enables to keep the system (the routing table) in a clear state. My two cents, Nicolas
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 28b01a068412..f60f5d14f034 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1133,6 +1133,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, rt->fib6_pmtu); return -EEXIST; } + + if (iter->fib6_type != rt->fib6_type || + iter->fib6_protocol != rt->fib6_protocol) + goto next_iter; + /* If we have the same destination and the same metric, * but not the same gateway, then the route we try to * add is sibling to this route, increment our counter
Different with IPv4, IPv6 will auto merge the same metric routes into multipath routes. But the different type and protocol routes are also merged, which will lost user's configure info. e.g. + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100 + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100 + ip -6 route show table 100 local 2001:db8:103::/64 metric 1024 pref medium nexthop via 2001:db8:101::10 dev dummy1 weight 1 nexthop via 2001:db8:101::10 dev dummy2 weight 1 + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200 + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200 + ip -6 route show table 200 2001:db8:104::/64 proto kernel metric 1024 pref medium nexthop via 2001:db8:101::10 dev dummy1 weight 1 nexthop via 2001:db8:101::10 dev dummy2 weight 1 So let's skip counting the different type and protocol routes as siblings. After update, the different type/protocol routes will not be merged. + ip -6 route show table 100 local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium + ip -6 route show table 200 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium Reported-by: Thomas Haller <thaller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- All fib test passed: Tests passed: 203 Tests failed: 0 --- net/ipv6/ip6_fib.c | 5 +++++ 1 file changed, 5 insertions(+)