diff mbox series

[net-next] ipv6: do not merge differe type and protocol routes

Message ID 20230830061550.2319741-1-liuhangbin@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] ipv6: do not merge differe type and protocol routes | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1331 this patch: 1331
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1355 this patch: 1355
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1354 this patch: 1354
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hangbin Liu Aug. 30, 2023, 6:15 a.m. UTC
Different with IPv4, IPv6 will auto merge the same metric routes into
multipath routes. But the different type and protocol routes are also
merged, which will lost user's configure info. e.g.

+ ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
+ ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
+ ip -6 route show table 100
local 2001:db8:103::/64 metric 1024 pref medium
        nexthop via 2001:db8:101::10 dev dummy1 weight 1
        nexthop via 2001:db8:101::10 dev dummy2 weight 1

+ ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
+ ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
+ ip -6 route show table 200
2001:db8:104::/64 proto kernel metric 1024 pref medium
        nexthop via 2001:db8:101::10 dev dummy1 weight 1
        nexthop via 2001:db8:101::10 dev dummy2 weight 1

So let's skip counting the different type and protocol routes as siblings.
After update, the different type/protocol routes will not be merged.

+ ip -6 route show table 100
local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium

+ ip -6 route show table 200
2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium

Reported-by: Thomas Haller <thaller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
All fib test passed:
Tests passed: 203
Tests failed:   0
---
 net/ipv6/ip6_fib.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

David Ahern Aug. 30, 2023, 2:49 p.m. UTC | #1
On 8/30/23 12:15 AM, Hangbin Liu wrote:
> Different with IPv4, IPv6 will auto merge the same metric routes into
> multipath routes. But the different type and protocol routes are also
> merged, which will lost user's configure info. e.g.
> 
> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
> + ip -6 route show table 100
> local 2001:db8:103::/64 metric 1024 pref medium
>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> 
> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
> + ip -6 route show table 200
> 2001:db8:104::/64 proto kernel metric 1024 pref medium
>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> 
> So let's skip counting the different type and protocol routes as siblings.
> After update, the different type/protocol routes will not be merged.
> 
> + ip -6 route show table 100
> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> 
> + ip -6 route show table 200
> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
> 
> Reported-by: Thomas Haller <thaller@redhat.com>
> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> All fib test passed:
> Tests passed: 203
> Tests failed:   0

Please add the above tests for this case to the fib tests script.

> ---
>  net/ipv6/ip6_fib.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 28b01a068412..f60f5d14f034 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -1133,6 +1133,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
>  							rt->fib6_pmtu);
>  				return -EEXIST;
>  			}
> +
> +			if (iter->fib6_type != rt->fib6_type ||
> +			    iter->fib6_protocol != rt->fib6_protocol)
> +				goto next_iter;
> +
>  			/* If we have the same destination and the same metric,
>  			 * but not the same gateway, then the route we try to
>  			 * add is sibling to this route, increment our counter

Reviewed-by: David Ahern <dsahern@kernel.org>
Nicolas Dichtel Aug. 30, 2023, 3:29 p.m. UTC | #2
Le 30/08/2023 à 08:15, Hangbin Liu a écrit :
> Different with IPv4, IPv6 will auto merge the same metric routes into
> multipath routes. But the different type and protocol routes are also
> merged, which will lost user's configure info. e.g.
> 
> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
> + ip -6 route show table 100
> local 2001:db8:103::/64 metric 1024 pref medium
>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> 
> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
> + ip -6 route show table 200
> 2001:db8:104::/64 proto kernel metric 1024 pref medium
>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> 
> So let's skip counting the different type and protocol routes as siblings.
> After update, the different type/protocol routes will not be merged.
> 
> + ip -6 route show table 100
> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> 
> + ip -6 route show table 200
> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium

This seems wrong. The goal of 'ip route append' is to add a next hop, not to
create a new route. Ok, it adds a new route if no route exists, but it seems
wrong to me to use it by default, instead of 'add', to make things work magically.

It seems more correct to return an error in these cases, but this will change
the uapi and it may break existing setups.

Before this patch, both next hops could be used by the kernel. After it, one
route will be ignored (the former or the last one?). This is confusing and also
seems wrong.

> 
> Reported-by: Thomas Haller <thaller@redhat.com>
> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994
Please, don't put private link. The bug entry is not public.

Can you explain what is the initial problem?


> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> All fib test passed:
> Tests passed: 203
> Tests failed:   0
> ---
>  net/ipv6/ip6_fib.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 28b01a068412..f60f5d14f034 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -1133,6 +1133,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
>  							rt->fib6_pmtu);
>  				return -EEXIST;
>  			}
> +
> +			if (iter->fib6_type != rt->fib6_type ||
> +			    iter->fib6_protocol != rt->fib6_protocol)
> +				goto next_iter;
> +
>  			/* If we have the same destination and the same metric,
>  			 * but not the same gateway, then the route we try to
>  			 * add is sibling to this route, increment our counter
David Ahern Aug. 30, 2023, 6:57 p.m. UTC | #3
On 8/30/23 9:29 AM, Nicolas Dichtel wrote:
> Le 30/08/2023 à 08:15, Hangbin Liu a écrit :
>> Different with IPv4, IPv6 will auto merge the same metric routes into
>> multipath routes. But the different type and protocol routes are also
>> merged, which will lost user's configure info. e.g.
>>
>> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
>> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
>> + ip -6 route show table 100
>> local 2001:db8:103::/64 metric 1024 pref medium
>>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
>>
>> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
>> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
>> + ip -6 route show table 200
>> 2001:db8:104::/64 proto kernel metric 1024 pref medium
>>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
>>
>> So let's skip counting the different type and protocol routes as siblings.
>> After update, the different type/protocol routes will not be merged.
>>
>> + ip -6 route show table 100
>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>
>> + ip -6 route show table 200
>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
> 
> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
> create a new route. Ok, it adds a new route if no route exists, but it seems
> wrong to me to use it by default, instead of 'add', to make things work magically.

Legacy API; nothing can be done about that (ie., that append makes a new
route when none exists).

> 
> It seems more correct to return an error in these cases, but this will change
> the uapi and it may break existing setups.
> 
> Before this patch, both next hops could be used by the kernel. After it, one
> route will be ignored (the former or the last one?). This is confusing and also
> seems wrong.

Append should match all details of a route to add to an existing entry
and make it multipath. If there is a difference (especially the type -
protocol difference is arguable) in attributes, then they are different
routes.
Hangbin Liu Aug. 30, 2023, 11:51 p.m. UTC | #4
On Wed, Aug 30, 2023 at 08:49:18AM -0600, David Ahern wrote:
> On 8/30/23 12:15 AM, Hangbin Liu wrote:
> > Different with IPv4, IPv6 will auto merge the same metric routes into
> > multipath routes. But the different type and protocol routes are also
> > merged, which will lost user's configure info. e.g.
> > 
> > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
> > + ip -6 route show table 100
> > local 2001:db8:103::/64 metric 1024 pref medium
> >         nexthop via 2001:db8:101::10 dev dummy1 weight 1
> >         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> > 
> > + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
> > + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
> > + ip -6 route show table 200
> > 2001:db8:104::/64 proto kernel metric 1024 pref medium
> >         nexthop via 2001:db8:101::10 dev dummy1 weight 1
> >         nexthop via 2001:db8:101::10 dev dummy2 weight 1
> > 
> > So let's skip counting the different type and protocol routes as siblings.
> > After update, the different type/protocol routes will not be merged.
> > 
> > + ip -6 route show table 100
> > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> > 
> > + ip -6 route show table 200
> > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
> > 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
> > 
> > Reported-by: Thomas Haller <thaller@redhat.com>
> > Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2161994
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > ---
> > All fib test passed:
> > Tests passed: 203
> > Tests failed:   0
> 
> Please add the above tests for this case to the fib tests script.

OK, I will add them and re-post after net-next open.

Thanks
Hangbin
Nicolas Dichtel Aug. 31, 2023, 8:17 a.m. UTC | #5
Le 30/08/2023 à 20:57, David Ahern a écrit :
> On 8/30/23 9:29 AM, Nicolas Dichtel wrote:
>> Le 30/08/2023 à 08:15, Hangbin Liu a écrit :
>>> Different with IPv4, IPv6 will auto merge the same metric routes into
>>> multipath routes. But the different type and protocol routes are also
>>> merged, which will lost user's configure info. e.g.
>>>
>>> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
>>> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
>>> + ip -6 route show table 100
>>> local 2001:db8:103::/64 metric 1024 pref medium
>>>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>>>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
>>>
>>> + ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
>>> + ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
>>> + ip -6 route show table 200
>>> 2001:db8:104::/64 proto kernel metric 1024 pref medium
>>>         nexthop via 2001:db8:101::10 dev dummy1 weight 1
>>>         nexthop via 2001:db8:101::10 dev dummy2 weight 1
>>>
>>> So let's skip counting the different type and protocol routes as siblings.
>>> After update, the different type/protocol routes will not be merged.
>>>
>>> + ip -6 route show table 100
>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>>
>>> + ip -6 route show table 200
>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
>>
>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
>> create a new route. Ok, it adds a new route if no route exists, but it seems
>> wrong to me to use it by default, instead of 'add', to make things work magically.
> 
> Legacy API; nothing can be done about that (ie., that append makes a new
> route when none exists).
> 
>>
>> It seems more correct to return an error in these cases, but this will change
>> the uapi and it may break existing setups.
>>
>> Before this patch, both next hops could be used by the kernel. After it, one
>> route will be ignored (the former or the last one?). This is confusing and also
>> seems wrong.
> 
> Append should match all details of a route to add to an existing entry
> and make it multipath. If there is a difference (especially the type -
> protocol difference is arguable) in attributes, then they are different
> routes.
> 

As you said, the protocol difference is arguable. It's not a property of the
route, just a hint.
I think the 'append' should match a route whatever the protocol is.
'ip route change' for example does not use the protocol to find the existing
route, it will update it:

$ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1
$ ip -6 route
2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium
$ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp
$ ip -6 route
2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium
$ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel
$ ip -6 route
2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium

Why would 'append' selects route differently?

This patch breaks the legacy API.
Hangbin Liu Aug. 31, 2023, 10:14 a.m. UTC | #6
Hi Nicolas,
On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote:
> >>> So let's skip counting the different type and protocol routes as siblings.
> >>> After update, the different type/protocol routes will not be merged.
> >>>
> >>> + ip -6 route show table 100
> >>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> >>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> >>>
> >>> + ip -6 route show table 200
> >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
> >>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
> >>
> >> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
> >> create a new route. Ok, it adds a new route if no route exists, but it seems
> >> wrong to me to use it by default, instead of 'add', to make things work magically.
> > 
> > Legacy API; nothing can be done about that (ie., that append makes a new
> > route when none exists).
> > 
> >>
> >> It seems more correct to return an error in these cases, but this will change
> >> the uapi and it may break existing setups.
> >>
> >> Before this patch, both next hops could be used by the kernel. After it, one
> >> route will be ignored (the former or the last one?). This is confusing and also
> >> seems wrong.
> > 
> > Append should match all details of a route to add to an existing entry
> > and make it multipath. If there is a difference (especially the type -
> > protocol difference is arguable) in attributes, then they are different
> > routes.
> > 
> 
> As you said, the protocol difference is arguable. It's not a property of the
> route, just a hint.
> I think the 'append' should match a route whatever the protocol is.
> 'ip route change' for example does not use the protocol to find the existing
> route, it will update it:
> 
> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1
> $ ip -6 route
> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium
> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp
> $ ip -6 route
> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium
> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel
> $ ip -6 route
> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium

Not sure if I understand correctly, `ip route replace` should able to
replace all other field other than dest and dev. It's for changing the route,
not only nexthop.
> 
> Why would 'append' selects route differently?

The append should also works for a single route, not only for append nexthop, no?

> 
> This patch breaks the legacy API.

As the patch's description. Who would expect different type/protocol route
should be merged as multipath route? I don't think the old API is correct.

Thanks
Hangbin
Nicolas Dichtel Aug. 31, 2023, 11:58 a.m. UTC | #7
Le 31/08/2023 à 12:14, Hangbin Liu a écrit :
> Hi Nicolas,
> On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote:
>>>>> So let's skip counting the different type and protocol routes as siblings.
>>>>> After update, the different type/protocol routes will not be merged.
>>>>>
>>>>> + ip -6 route show table 100
>>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>>>>
>>>>> + ip -6 route show table 200
>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
>>>>
>>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
>>>> create a new route. Ok, it adds a new route if no route exists, but it seems
>>>> wrong to me to use it by default, instead of 'add', to make things work magically.
>>>
>>> Legacy API; nothing can be done about that (ie., that append makes a new
>>> route when none exists).
>>>
>>>>
>>>> It seems more correct to return an error in these cases, but this will change
>>>> the uapi and it may break existing setups.
>>>>
>>>> Before this patch, both next hops could be used by the kernel. After it, one
>>>> route will be ignored (the former or the last one?). This is confusing and also
>>>> seems wrong.
>>>
>>> Append should match all details of a route to add to an existing entry
>>> and make it multipath. If there is a difference (especially the type -
>>> protocol difference is arguable) in attributes, then they are different
>>> routes.
>>>
>>
>> As you said, the protocol difference is arguable. It's not a property of the
>> route, just a hint.
>> I think the 'append' should match a route whatever the protocol is.
>> 'ip route change' for example does not use the protocol to find the existing
>> route, it will update it:
>>
>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1
>> $ ip -6 route
>> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium
>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp
>> $ ip -6 route
>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium
>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel
>> $ ip -6 route
>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium
> 
> Not sure if I understand correctly, `ip route replace` should able to
> replace all other field other than dest and dev. It's for changing the route,
> not only nexthop.
>>
>> Why would 'append' selects route differently?
> 
> The append should also works for a single route, not only for append nexthop, no?
I don't think so. The 'append' should 'join', not add. Adding more cases where a
route is added instead of appended doesn't make the API clearer.

With this patch, it will be possible to add a new route with the 'append'
command when the 'add' command fails:
$ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
$ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
RTNETLINK answers: File exists

$ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200
$ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200
RTNETLINK answers: File exists

This makes the API more confusing and complex. And I don't understand how it
will be used later. There will be 2 routes on the system, but only one will be
used, which one? This is confusing.

> 
>>
>> This patch breaks the legacy API.
> 
> As the patch's description. Who would expect different type/protocol route
> should be merged as multipath route? I don't think the old API is correct.
The question is not 'who expect', but 'is there some systems somewhere that rely
on this (deliberately or not)'.
Frankly, the protocol is just informative, so I don't see why it is a problem to
ignore it with the 'append' command.
For the type, it is weird, for sure. Rejecting the command seems better than
duplicating routes. Which route is used by the stack?


Regards,
Nicolas
David Ahern Aug. 31, 2023, 6:27 p.m. UTC | #8
On 8/31/23 5:58 AM, Nicolas Dichtel wrote:
> Le 31/08/2023 à 12:14, Hangbin Liu a écrit :
>> Hi Nicolas,
>> On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote:
>>>>>> So let's skip counting the different type and protocol routes as siblings.
>>>>>> After update, the different type/protocol routes will not be merged.
>>>>>>
>>>>>> + ip -6 route show table 100
>>>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>>>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>>>>>
>>>>>> + ip -6 route show table 200
>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
>>>>>
>>>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
>>>>> create a new route. Ok, it adds a new route if no route exists, but it seems
>>>>> wrong to me to use it by default, instead of 'add', to make things work magically.
>>>>
>>>> Legacy API; nothing can be done about that (ie., that append makes a new
>>>> route when none exists).
>>>>
>>>>>
>>>>> It seems more correct to return an error in these cases, but this will change
>>>>> the uapi and it may break existing setups.
>>>>>
>>>>> Before this patch, both next hops could be used by the kernel. After it, one
>>>>> route will be ignored (the former or the last one?). This is confusing and also
>>>>> seems wrong.
>>>>
>>>> Append should match all details of a route to add to an existing entry
>>>> and make it multipath. If there is a difference (especially the type -
>>>> protocol difference is arguable) in attributes, then they are different
>>>> routes.
>>>>
>>>
>>> As you said, the protocol difference is arguable. It's not a property of the
>>> route, just a hint.
>>> I think the 'append' should match a route whatever the protocol is.
>>> 'ip route change' for example does not use the protocol to find the existing
>>> route, it will update it:
>>>
>>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1
>>> $ ip -6 route
>>> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium
>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp
>>> $ ip -6 route
>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium
>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel
>>> $ ip -6 route
>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium
>>
>> Not sure if I understand correctly, `ip route replace` should able to
>> replace all other field other than dest and dev. It's for changing the route,
>> not only nexthop.
>>>
>>> Why would 'append' selects route differently?
>>
>> The append should also works for a single route, not only for append nexthop, no?
> I don't think so. The 'append' should 'join', not add. Adding more cases where a
> route is added instead of appended doesn't make the API clearer.
> 
> With this patch, it will be possible to add a new route with the 'append'
> command when the 'add' command fails:
> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
> RTNETLINK answers: File exists
> 
> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200
> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200
> RTNETLINK answers: File exists
> 
> This makes the API more confusing and complex. And I don't understand how it
> will be used later. There will be 2 routes on the system, but only one will be
> used, which one? This is confusing.
> 
>>
>>>
>>> This patch breaks the legacy API.
>>
>> As the patch's description. Who would expect different type/protocol route
>> should be merged as multipath route? I don't think the old API is correct.
> The question is not 'who expect', but 'is there some systems somewhere that rely
> on this (deliberately or not)'.
> Frankly, the protocol is just informative, so I don't see why it is a problem to
> ignore it with the 'append' command.
> For the type, it is weird, for sure. Rejecting the command seems better than
> duplicating routes. Which route is used by the stack?
> 
> 

Part of my intent with fib_tests.sh was to document the legacy meaning
of 'append, prepend, replace, and change' options while also providing a
test script to detect changes that cause a regression.

I do agree now that protocol is informative (passthrough from the kernel
perspective) so not really part of the route. That should be dropped
from the patch leaving just a check on rt_type as to whether the routes
are different. From there the append, prepend, replace and change
semantics should decide what happens (ie., how the route is inserted).
Hangbin Liu Sept. 1, 2023, 3:58 a.m. UTC | #9
On Thu, Aug 31, 2023 at 01:58:48PM +0200, Nicolas Dichtel wrote:
> > The append should also works for a single route, not only for append nexthop, no?
> I don't think so. The 'append' should 'join', not add. Adding more cases where a
> route is added instead of appended doesn't make the API clearer.
> 
> With this patch, it will be possible to add a new route with the 'append'
> command when the 'add' command fails:
> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
> RTNETLINK answers: File exists
> 
> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200
> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200
> RTNETLINK answers: File exists
> 
> This makes the API more confusing and complex. And I don't understand how it
> will be used later. There will be 2 routes on the system, but only one will be
> used, which one? This is confusing.

Just to makeit it clear, the new patch will not add two route with only
different type/protocol. Here is the result with my patch.

+ ip -6 route flush table 300
+ ip link add dummy1 up type dummy
+ ip link add dummy2 up type dummy
+ ip addr add 2001:db8:101::1/64 dev dummy1
+ ip addr add 2001:db8:101::2/64 dev dummy2
+ ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
+ ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
RTNETLINK answers: File exists

     ^^ here the append still failed

+ ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
+ ip -6 route show table 100
local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
+ ip route add 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel table 200
+ ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto bgp table 200
RTNETLINK answers: File exists

     ^^ And here

+ ip route append 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp table 200
+ ip -6 route show table 200
2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium

Thanks
Hangbin
Nicolas Dichtel Sept. 1, 2023, 9:36 a.m. UTC | #10
Le 31/08/2023 à 20:27, David Ahern a écrit :
> On 8/31/23 5:58 AM, Nicolas Dichtel wrote:
>> Le 31/08/2023 à 12:14, Hangbin Liu a écrit :
>>> Hi Nicolas,
>>> On Thu, Aug 31, 2023 at 10:17:19AM +0200, Nicolas Dichtel wrote:
>>>>>>> So let's skip counting the different type and protocol routes as siblings.
>>>>>>> After update, the different type/protocol routes will not be merged.
>>>>>>>
>>>>>>> + ip -6 route show table 100
>>>>>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>>>>>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>>>>>>
>>>>>>> + ip -6 route show table 200
>>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy1 proto kernel metric 1024 pref medium
>>>>>>> 2001:db8:104::/64 via 2001:db8:101::10 dev dummy2 proto bgp metric 1024 pref medium
>>>>>>
>>>>>> This seems wrong. The goal of 'ip route append' is to add a next hop, not to
>>>>>> create a new route. Ok, it adds a new route if no route exists, but it seems
>>>>>> wrong to me to use it by default, instead of 'add', to make things work magically.
>>>>>
>>>>> Legacy API; nothing can be done about that (ie., that append makes a new
>>>>> route when none exists).
>>>>>
>>>>>>
>>>>>> It seems more correct to return an error in these cases, but this will change
>>>>>> the uapi and it may break existing setups.
>>>>>>
>>>>>> Before this patch, both next hops could be used by the kernel. After it, one
>>>>>> route will be ignored (the former or the last one?). This is confusing and also
>>>>>> seems wrong.
>>>>>
>>>>> Append should match all details of a route to add to an existing entry
>>>>> and make it multipath. If there is a difference (especially the type -
>>>>> protocol difference is arguable) in attributes, then they are different
>>>>> routes.
>>>>>
>>>>
>>>> As you said, the protocol difference is arguable. It's not a property of the
>>>> route, just a hint.
>>>> I think the 'append' should match a route whatever the protocol is.
>>>> 'ip route change' for example does not use the protocol to find the existing
>>>> route, it will update it:
>>>>
>>>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1
>>>> $ ip -6 route
>>>> 2003:1:2:3::/64 via 2001::2 dev eth1 metric 1024 pref medium
>>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp
>>>> $ ip -6 route
>>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto bgp metric 1024 pref medium
>>>> $ ip -6 route change 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel
>>>> $ ip -6 route
>>>> 2003:1:2:3::/64 via 2001::2 dev eth1 proto kernel metric 1024 pref medium
>>>
>>> Not sure if I understand correctly, `ip route replace` should able to
>>> replace all other field other than dest and dev. It's for changing the route,
>>> not only nexthop.
>>>>
>>>> Why would 'append' selects route differently?
>>>
>>> The append should also works for a single route, not only for append nexthop, no?
>> I don't think so. The 'append' should 'join', not add. Adding more cases where a
>> route is added instead of appended doesn't make the API clearer.
>>
>> With this patch, it will be possible to add a new route with the 'append'
>> command when the 'add' command fails:
>> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
>> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
>> RTNETLINK answers: File exists
>>
>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200
>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200
>> RTNETLINK answers: File exists
>>
>> This makes the API more confusing and complex. And I don't understand how it
>> will be used later. There will be 2 routes on the system, but only one will be
>> used, which one? This is confusing.
>>
>>>
>>>>
>>>> This patch breaks the legacy API.
>>>
>>> As the patch's description. Who would expect different type/protocol route
>>> should be merged as multipath route? I don't think the old API is correct.
>> The question is not 'who expect', but 'is there some systems somewhere that rely
>> on this (deliberately or not)'.
>> Frankly, the protocol is just informative, so I don't see why it is a problem to
>> ignore it with the 'append' command.
>> For the type, it is weird, for sure. Rejecting the command seems better than
>> duplicating routes. Which route is used by the stack?
>>
>>
> 
> Part of my intent with fib_tests.sh was to document the legacy meaning
> of 'append, prepend, replace, and change' options while also providing a
> test script to detect changes that cause a regression.
Yes, it's a good idea for sure.

> 
> I do agree now that protocol is informative (passthrough from the kernel
> perspective) so not really part of the route. That should be dropped
> from the patch leaving just a check on rt_type as to whether the routes
> are different. From there the append, prepend, replace and change
> semantics should decide what happens (ie., how the route is inserted).
Right. What can guide us is the meaning/concept/benefit of having this kind of
routing table:
local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium

I don't understand how this is used/useful. It's why I ask for the use case/goal
of this patch.
How does the user know which route is used?


Regards,
Nicolas
Nicolas Dichtel Sept. 1, 2023, 9:50 a.m. UTC | #11
Le 01/09/2023 à 05:58, Hangbin Liu a écrit :
> On Thu, Aug 31, 2023 at 01:58:48PM +0200, Nicolas Dichtel wrote:
>>> The append should also works for a single route, not only for append nexthop, no?
>> I don't think so. The 'append' should 'join', not add. Adding more cases where a
>> route is added instead of appended doesn't make the API clearer.
It was clear to me. I distinguish the route part and the next hop part.

>>
>> With this patch, it will be possible to add a new route with the 'append'
>> command when the 'add' command fails:
>> $ ip -6 route add local 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
>> $ ip -6 route add unicast 2003:1:2:3::/64 via 2001::2 dev eth1 table 200
>> RTNETLINK answers: File exists
>>
>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol bgp table 200
>> $ ip -6 route add 2003:1:2:3::/64 via 2001::2 dev eth1 protocol kernel table 200
>> RTNETLINK answers: File exists
>>
>> This makes the API more confusing and complex. And I don't understand how it
>> will be used later. There will be 2 routes on the system, but only one will be
>> used, which one? This is confusing.
> 
> Just to makeit it clear, the new patch will not add two route with only
> different type/protocol. Here is the result with my patch.
> 
> + ip -6 route flush table 300
> + ip link add dummy1 up type dummy
> + ip link add dummy2 up type dummy
> + ip addr add 2001:db8:101::1/64 dev dummy1
> + ip addr add 2001:db8:101::2/64 dev dummy2
> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> RTNETLINK answers: File exists
> 
>      ^^ here the append still failed
And if I play a little bit of the devil's advocate: why is it rejected? It's not
the same route, the types differ :)

> 
> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
> + ip -6 route show table 100
> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium

What is the purpose of such a routing table?
How is the gateway of a local route used by the kernel?
Which route will be used when a packet enters the stack?


Regards,
Nicolas
Hangbin Liu Sept. 15, 2023, 2:23 a.m. UTC | #12
On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote:
> > I do agree now that protocol is informative (passthrough from the kernel
> > perspective) so not really part of the route. That should be dropped

I'm not sure. Is there any user space route daemon will use this info? e.g. some
BGP route daemon?

> > from the patch leaving just a check on rt_type as to whether the routes
> > are different. From there the append, prepend, replace and change
> > semantics should decide what happens (ie., how the route is inserted).
> Right. What can guide us is the meaning/concept/benefit of having this kind of
> routing table:
> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> 
> I don't understand how this is used/useful. It's why I ask for the use case/goal
> of this patch.
> How does the user know which route is used?

I'm not sure how user will use it. Maybe just block/forward some traffic to
local first and remove the local route later to unblock them. IPv4 can also
do like this. e.g.

+ ip link add dummy1 up type dummy
+ ip link add dummy2 up type dummy
+ ip addr add 192.168.0.1/24 dev dummy1
+ ip addr add 192.168.0.2/24 dev dummy2
+ ip route add local 192.168.3.0/24 dev dummy1 table 100
+ ip route append unicast 192.168.3.0/24 dev dummy1 table 100
+ ip route append unicast 192.168.3.0/24 dev dummy2 table 100
+ ip route show table 100
local 192.168.3.0/24 dev dummy1 scope host
192.168.3.0/24 dev dummy1 scope link
192.168.3.0/24 dev dummy2 scope link

Thanks
Hangbin
David Ahern Sept. 15, 2023, 3:08 a.m. UTC | #13
On 9/14/23 8:23 PM, Hangbin Liu wrote:
> On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote:
>>> I do agree now that protocol is informative (passthrough from the kernel
>>> perspective) so not really part of the route. That should be dropped
> 
> I'm not sure. Is there any user space route daemon will use this info? e.g. some
> BGP route daemon?

It is passthrough information for userspace to track who installed the
route. It is not part of the route itself, but metadata passed in and
returned.
Hangbin Liu Sept. 15, 2023, 3:49 a.m. UTC | #14
On Fri, Sep 01, 2023 at 11:50:12AM +0200, Nicolas Dichtel wrote:
> > + ip -6 route flush table 300
> > + ip link add dummy1 up type dummy
> > + ip link add dummy2 up type dummy
> > + ip addr add 2001:db8:101::1/64 dev dummy1
> > + ip addr add 2001:db8:101::2/64 dev dummy2
> > + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
> > RTNETLINK answers: File exists
> > 
> >      ^^ here the append still failed
> And if I play a little bit of the devil's advocate: why is it rejected? It's not
> the same route, the types differ :)

This is actually for compatibility...
> 
> > 
> > + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
> > + ip -6 route show table 100
> > local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
> > 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
> 
> What is the purpose of such a routing table?

As I replied in last mail. I don't have a clear purpose. A user may use
the local route to block traffic temporary.

> How is the gateway of a local route used by the kernel?

I don't know. IPv6 support this since beginning...

> Which route will be used when a packet enters the stack?

The first one it find?

Thanks
Hangbin
Hangbin Liu Sept. 15, 2023, 10:02 a.m. UTC | #15
On Thu, Sep 14, 2023 at 09:08:18PM -0600, David Ahern wrote:
> On 9/14/23 8:23 PM, Hangbin Liu wrote:
> > On Fri, Sep 01, 2023 at 11:36:51AM +0200, Nicolas Dichtel wrote:
> >>> I do agree now that protocol is informative (passthrough from the kernel
> >>> perspective) so not really part of the route. That should be dropped
> > 
> > I'm not sure. Is there any user space route daemon will use this info? e.g. some
> > BGP route daemon?
> 
> It is passthrough information for userspace to track who installed the
> route. It is not part of the route itself, but metadata passed in and
> returned.

Thanks. I will drop this info.

Hangbin
Nicolas Dichtel Sept. 15, 2023, 3:58 p.m. UTC | #16
Le 15/09/2023 à 05:49, Hangbin Liu a écrit :
> On Fri, Sep 01, 2023 at 11:50:12AM +0200, Nicolas Dichtel wrote:
[snip]
>>> + ip route append unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 100
>>> + ip -6 route show table 100
>>> local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 metric 1024 pref medium
>>> 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 metric 1024 pref medium
>>
>> What is the purpose of such a routing table?
> 
> As I replied in last mail. I don't have a clear purpose. A user may use
> the local route to block traffic temporary.
> 
>> How is the gateway of a local route used by the kernel?
> 
> I don't know. IPv6 support this since beginning...
> 
>> Which route will be used when a packet enters the stack?
> 
> The first one it find?
Frankly, it seems strange to me to try to fix a scenario without a clear view of
what is expected. I agree that the current behavior is questionable, but it
enables to keep the system (the routing table) in a clear state.


My two cents,
Nicolas
diff mbox series

Patch

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 28b01a068412..f60f5d14f034 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1133,6 +1133,11 @@  static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 							rt->fib6_pmtu);
 				return -EEXIST;
 			}
+
+			if (iter->fib6_type != rt->fib6_type ||
+			    iter->fib6_protocol != rt->fib6_protocol)
+				goto next_iter;
+
 			/* If we have the same destination and the same metric,
 			 * but not the same gateway, then the route we try to
 			 * add is sibling to this route, increment our counter