diff mbox series

[PATCHv3,net] ipv6: do not match device when remove source route

Message ID 20230720065941.3294051-1-liuhangbin@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [PATCHv3,net] ipv6: do not match device when remove source route | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1343 this patch: 1343
netdev/cc_maintainers fail 1 blamed authors not CCed: sahne@0x90.at; 3 maintainers not CCed: linux-kselftest@vger.kernel.org shuah@kernel.org sahne@0x90.at
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success net selftest script(s) already in Makefile
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1366 this patch: 1366
netdev/checkpatch warning WARNING: line length of 256 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hangbin Liu July 20, 2023, 6:59 a.m. UTC
After deleting an IPv6 address on an interface and cleaning up the
related preferred source entries, it is important to ensure that all
routes associated with the deleted address are properly cleared. The
current implementation of rt6_remove_prefsrc() only checks the preferred
source addresses bound to the current device. However, there may be
routes that are bound to other devices but still utilize the same
preferred source address.

To address this issue, it is necessary to also delete entries that are
bound to other interfaces but share the same source address with the
current device. Failure to delete these entries would leave routes that
are bound to the deleted address unclear. Here is an example reproducer
(I have omitted unrelated routes):

+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 1:2:3:4::5/64 dev dummy1
+ ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5
+ ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5
+ ip -6 route show
1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium
7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium

After fix:

+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 metric 1024 pref medium

Ido notified that there is a commit 5a56a0b3a45d ("net: Don't delete
routes in different VRFs") to not affect the route in different VRFs.
So let's remove the rt dev checking and add an table id checking.
Also remove the !rt-nh checking to clear the IPv6 routes that are using
a nexthop object. This would be consistent with IPv4.

A ipv6_del_addr test is added for fib_tests.sh. Note that instead
of removing the whole route for IPv4, IPv6 only remove the preferred
source address for source routing. So in the testing use
"grep -q src $src_ipv6_address" instead of "grep -q $dst_ipv6_subnet/64"
when checking if the source route deleted.

Here is the fib_tests.sh ipv6_del_addr test result.

Before fix:
IPv6 delete address route tests
    Regular FIB info
    TEST: Prefsrc removed from VRF when source address deleted            [ OK ]
    TEST: Prefsrc in default VRF not removed                              [FAIL]
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]
    TEST: Prefsrc in VRF is not removed by address delete                 [FAIL]
    Identical FIB info with different table ID
    TEST: Prefsrc removed from VRF when source address deleted            [ OK ]
    TEST: Prefsrc in default VRF not removed                              [FAIL]
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]
    TEST: Prefsrc in VRF is not removed by address delete                 [FAIL]
    Table ID 0
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]

After fix:
IPv6 delete address route tests
    Regular FIB info
    TEST: Prefsrc removed from VRF when source address deleted            [ OK ]
    TEST: Prefsrc in default VRF not removed                              [ OK ]
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]
    TEST: Prefsrc in VRF is not removed by address delete                 [ OK ]
    Identical FIB info with different table ID
    TEST: Prefsrc removed from VRF when source address deleted            [ OK ]
    TEST: Prefsrc in default VRF not removed                              [ OK ]
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]
    TEST: Prefsrc in VRF is not removed by address delete                 [ OK ]
    Table ID 0
    TEST: Prefsrc removed in default VRF when source address deleted      [ OK ]

Reported-by: Thomas Haller <thaller@redhat.com>
Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v3: remove rt nh checking. update the ipv6_del_addr test descriptions
v2: checking table id and update fib_test.sh
---
 net/ipv6/route.c                         |  6 +-
 tools/testing/selftests/net/fib_tests.sh | 93 +++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 4 deletions(-)

Comments

Ido Schimmel July 20, 2023, 1:22 p.m. UTC | #1
On Thu, Jul 20, 2023 at 02:59:41PM +0800, Hangbin Liu wrote:
> After deleting an IPv6 address on an interface and cleaning up the
> related preferred source entries, it is important to ensure that all
> routes associated with the deleted address are properly cleared. The
> current implementation of rt6_remove_prefsrc() only checks the preferred
> source addresses bound to the current device. However, there may be
> routes that are bound to other devices but still utilize the same
> preferred source address.
> 
> To address this issue, it is necessary to also delete entries that are
> bound to other interfaces but share the same source address with the
> current device. Failure to delete these entries would leave routes that
> are bound to the deleted address unclear. Here is an example reproducer
> (I have omitted unrelated routes):

[...]

> Ido notified that there is a commit 5a56a0b3a45d ("net: Don't delete
> routes in different VRFs") to not affect the route in different VRFs.
> So let's remove the rt dev checking and add an table id checking.
> Also remove the !rt-nh checking to clear the IPv6 routes that are using
> a nexthop object. This would be consistent with IPv4.
> 
> A ipv6_del_addr test is added for fib_tests.sh. Note that instead
> of removing the whole route for IPv4, IPv6 only remove the preferred
> source address for source routing. So in the testing use
> "grep -q src $src_ipv6_address" instead of "grep -q $dst_ipv6_subnet/64"
> when checking if the source route deleted.
> 
> Here is the fib_tests.sh ipv6_del_addr test result.

[...]

> 
> Reported-by: Thomas Haller <thaller@redhat.com>
> Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

One nit below

[...]

> @@ -1869,6 +1869,96 @@ ipv4_del_addr_test()
>  	cleanup
>  }
>  
> +ipv6_del_addr_test()
> +{

[...]

> +}
> +
>  

Double blank line
Ido Schimmel July 20, 2023, 2:49 p.m. UTC | #2
On Thu, Jul 20, 2023 at 04:22:11PM +0300, Ido Schimmel wrote:
> On Thu, Jul 20, 2023 at 02:59:41PM +0800, Hangbin Liu wrote:
> > After deleting an IPv6 address on an interface and cleaning up the
> > related preferred source entries, it is important to ensure that all
> > routes associated with the deleted address are properly cleared. The
> > current implementation of rt6_remove_prefsrc() only checks the preferred
> > source addresses bound to the current device. However, there may be
> > routes that are bound to other devices but still utilize the same
> > preferred source address.
> > 
> > To address this issue, it is necessary to also delete entries that are
> > bound to other interfaces but share the same source address with the
> > current device. Failure to delete these entries would leave routes that
> > are bound to the deleted address unclear. Here is an example reproducer
> > (I have omitted unrelated routes):
> 
> [...]
> 
> > Ido notified that there is a commit 5a56a0b3a45d ("net: Don't delete
> > routes in different VRFs") to not affect the route in different VRFs.
> > So let's remove the rt dev checking and add an table id checking.
> > Also remove the !rt-nh checking to clear the IPv6 routes that are using
> > a nexthop object. This would be consistent with IPv4.
> > 
> > A ipv6_del_addr test is added for fib_tests.sh. Note that instead
> > of removing the whole route for IPv4, IPv6 only remove the preferred
> > source address for source routing. So in the testing use
> > "grep -q src $src_ipv6_address" instead of "grep -q $dst_ipv6_subnet/64"
> > when checking if the source route deleted.
> > 
> > Here is the fib_tests.sh ipv6_del_addr test result.
> 
> [...]
> 
> > 
> > Reported-by: Thomas Haller <thaller@redhat.com>
> > Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> 
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Actually, there is another problem here. IPv4 checks that the address is
indeed gone (it can be assigned to more than one interface):

+ ip link add name dummy1 up type dummy
+ ip link add name dummy2 up type dummy
+ ip link add name dummy3 up type dummy
+ ip address add 192.0.2.1/24 dev dummy1
+ ip address add 192.0.2.1/24 dev dummy2
+ ip route add 198.51.100.0/24 dev dummy3 src 192.0.2.1
+ ip address del 192.0.2.1/24 dev dummy2
+ ip -4 r s
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
198.51.100.0/24 dev dummy3 scope link src 192.0.2.1 

But it doesn't happen for IPv6:

+ ip link add name dummy1 up type dummy
+ ip link add name dummy2 up type dummy
+ ip link add name dummy3 up type dummy
+ ip address add 2001:db8:1::1/64 dev dummy1
+ ip address add 2001:db8:1::1/64 dev dummy2
+ ip route add 2001:db8:2::/64 dev dummy3 src 2001:db8:1::1
+ ip address del 2001:db8:1::1/64 dev dummy2
+ ip -6 r s
2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy3 metric 1024 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium
fe80::/64 dev dummy3 proto kernel metric 256 pref medium
Hangbin Liu July 21, 2023, 8:59 a.m. UTC | #3
Hi Ido,

On Thu, Jul 20, 2023 at 05:49:47PM +0300, Ido Schimmel wrote:
> Actually, there is another problem here. IPv4 checks that the address is
> indeed gone (it can be assigned to more than one interface):
> 
> + ip link add name dummy1 up type dummy
> + ip link add name dummy2 up type dummy
> + ip link add name dummy3 up type dummy
> + ip address add 192.0.2.1/24 dev dummy1
> + ip address add 192.0.2.1/24 dev dummy2
> + ip route add 198.51.100.0/24 dev dummy3 src 192.0.2.1
> + ip address del 192.0.2.1/24 dev dummy2
> + ip -4 r s
> 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
> 198.51.100.0/24 dev dummy3 scope link src 192.0.2.1 
> 
> But it doesn't happen for IPv6:
> 
> + ip link add name dummy1 up type dummy
> + ip link add name dummy2 up type dummy
> + ip link add name dummy3 up type dummy
> + ip address add 2001:db8:1::1/64 dev dummy1
> + ip address add 2001:db8:1::1/64 dev dummy2
> + ip route add 2001:db8:2::/64 dev dummy3 src 2001:db8:1::1
> + ip address del 2001:db8:1::1/64 dev dummy2
> + ip -6 r s
> 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> 2001:db8:2::/64 dev dummy3 metric 1024 pref medium
> fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> fe80::/64 dev dummy3 proto kernel metric 256 pref medium

Hmm, what kind of usage that need to add same address on different interfaces?
BTW, to fix it, how about check if the IPv6 addr still exist. e.g.

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4590,10 +4590,11 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
        struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
        struct net *net = ((struct arg_dev_net_ip *)arg)->net;
        struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
+       u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;

-       if (!rt->nh &&
-           ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) &&
-           rt != net->ipv6.fib6_null_entry &&
+       if (rt != net->ipv6.fib6_null_entry &&
+           rt->fib6_table->tb6_id == tb6_id &&
+           !ipv6_chk_addr_and_flags(net, addr, NULL, true, 0, IFA_F_TENTATIVE) &&
            ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
                spin_lock_bh(&rt6_exception_lock);
                /* remove prefsrc entry */

Thanks
Hangbin
Ido Schimmel July 23, 2023, 8:13 a.m. UTC | #4
On Fri, Jul 21, 2023 at 04:59:21PM +0800, Hangbin Liu wrote:
> Hi Ido,
> 
> On Thu, Jul 20, 2023 at 05:49:47PM +0300, Ido Schimmel wrote:
> > Actually, there is another problem here. IPv4 checks that the address is
> > indeed gone (it can be assigned to more than one interface):
> > 
> > + ip link add name dummy1 up type dummy
> > + ip link add name dummy2 up type dummy
> > + ip link add name dummy3 up type dummy
> > + ip address add 192.0.2.1/24 dev dummy1
> > + ip address add 192.0.2.1/24 dev dummy2
> > + ip route add 198.51.100.0/24 dev dummy3 src 192.0.2.1
> > + ip address del 192.0.2.1/24 dev dummy2
> > + ip -4 r s
> > 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
> > 198.51.100.0/24 dev dummy3 scope link src 192.0.2.1 
> > 
> > But it doesn't happen for IPv6:
> > 
> > + ip link add name dummy1 up type dummy
> > + ip link add name dummy2 up type dummy
> > + ip link add name dummy3 up type dummy
> > + ip address add 2001:db8:1::1/64 dev dummy1
> > + ip address add 2001:db8:1::1/64 dev dummy2
> > + ip route add 2001:db8:2::/64 dev dummy3 src 2001:db8:1::1
> > + ip address del 2001:db8:1::1/64 dev dummy2
> > + ip -6 r s
> > 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> > 2001:db8:2::/64 dev dummy3 metric 1024 pref medium
> > fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> > fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> > fe80::/64 dev dummy3 proto kernel metric 256 pref medium
> 
> Hmm, what kind of usage that need to add same address on different interfaces?

I don't know, but when I checked the code and tested it I noticed that
the kernel doesn't care on which interface the address is configured.
Therefore, in order for deletion to be consistent with addition and with
IPv4, the preferred source address shouldn't be removed from routes in
the VRF table as long as the address is configured on one of the
interfaces in the VRF.

> BTW, to fix it, how about check if the IPv6 addr still exist. e.g.
> 
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4590,10 +4590,11 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
>         struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
>         struct net *net = ((struct arg_dev_net_ip *)arg)->net;
>         struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
> +       u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
> 
> -       if (!rt->nh &&
> -           ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) &&
> -           rt != net->ipv6.fib6_null_entry &&
> +       if (rt != net->ipv6.fib6_null_entry &&
> +           rt->fib6_table->tb6_id == tb6_id &&
> +           !ipv6_chk_addr_and_flags(net, addr, NULL, true, 0, IFA_F_TENTATIVE) &&
>             ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {

ipv6_chk_addr_and_flags() is more expensive than ipv6_addr_equal() so
better to first check that route indeed uses the address as the
preferred source address and only then check if it exists.

Maybe you can even do it in rt6_remove_prefsrc(). That would be similar
to what IPv4 is doing.

>                 spin_lock_bh(&rt6_exception_lock);
>                 /* remove prefsrc entry */
> 
> Thanks
> Hangbin
David Ahern July 23, 2023, 6:12 p.m. UTC | #5
On 7/23/23 2:13 AM, Ido Schimmel wrote:
> 
> I don't know, but when I checked the code and tested it I noticed that
> the kernel doesn't care on which interface the address is configured.
> Therefore, in order for deletion to be consistent with addition and with
> IPv4, the preferred source address shouldn't be removed from routes in
> the VRF table as long as the address is configured on one of the
> interfaces in the VRF.
> 

Deleting routes associated with device 2 when an address is deleted from
device 1 is going to introduce as many problems as it solves. The VRF
use case is one example.
Hangbin Liu July 24, 2023, 9:42 a.m. UTC | #6
On Sun, Jul 23, 2023 at 11:13:36AM +0300, Ido Schimmel wrote:
> > BTW, to fix it, how about check if the IPv6 addr still exist. e.g.
> > 
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -4590,10 +4590,11 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
> >         struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
> >         struct net *net = ((struct arg_dev_net_ip *)arg)->net;
> >         struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
> > +       u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
> > 
> > -       if (!rt->nh &&
> > -           ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) &&
> > -           rt != net->ipv6.fib6_null_entry &&
> > +       if (rt != net->ipv6.fib6_null_entry &&
> > +           rt->fib6_table->tb6_id == tb6_id &&
> > +           !ipv6_chk_addr_and_flags(net, addr, NULL, true, 0, IFA_F_TENTATIVE) &&
> >             ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
> 
> ipv6_chk_addr_and_flags() is more expensive than ipv6_addr_equal() so
> better to first check that route indeed uses the address as the
> preferred source address and only then check if it exists.

OK.
> 
> Maybe you can even do it in rt6_remove_prefsrc(). That would be similar
> to what IPv4 is doing.

Do you mean call ipv6_chk_addr_and_flags() in rt6_remove_prefsrc()?

Thanks
Hangbin
Ido Schimmel July 25, 2023, 8:06 a.m. UTC | #7
On Mon, Jul 24, 2023 at 05:42:34PM +0800, Hangbin Liu wrote:
> On Sun, Jul 23, 2023 at 11:13:36AM +0300, Ido Schimmel wrote:
> > > BTW, to fix it, how about check if the IPv6 addr still exist. e.g.
> > > 
> > > --- a/net/ipv6/route.c
> > > +++ b/net/ipv6/route.c
> > > @@ -4590,10 +4590,11 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
> > >         struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
> > >         struct net *net = ((struct arg_dev_net_ip *)arg)->net;
> > >         struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
> > > +       u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
> > > 
> > > -       if (!rt->nh &&
> > > -           ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) &&
> > > -           rt != net->ipv6.fib6_null_entry &&
> > > +       if (rt != net->ipv6.fib6_null_entry &&
> > > +           rt->fib6_table->tb6_id == tb6_id &&
> > > +           !ipv6_chk_addr_and_flags(net, addr, NULL, true, 0, IFA_F_TENTATIVE) &&
> > >             ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
> > 
> > ipv6_chk_addr_and_flags() is more expensive than ipv6_addr_equal() so
> > better to first check that route indeed uses the address as the
> > preferred source address and only then check if it exists.
> 
> OK.
> > 
> > Maybe you can even do it in rt6_remove_prefsrc(). That would be similar
> > to what IPv4 is doing.
> 
> Do you mean call ipv6_chk_addr_and_flags() in rt6_remove_prefsrc()?

Yes.
Ido Schimmel July 25, 2023, 10:06 a.m. UTC | #8
On Sun, Jul 23, 2023 at 12:12:00PM -0600, David Ahern wrote:
> On 7/23/23 2:13 AM, Ido Schimmel wrote:
> > 
> > I don't know, but when I checked the code and tested it I noticed that
> > the kernel doesn't care on which interface the address is configured.
> > Therefore, in order for deletion to be consistent with addition and with
> > IPv4, the preferred source address shouldn't be removed from routes in
> > the VRF table as long as the address is configured on one of the
> > interfaces in the VRF.
> > 
> 
> Deleting routes associated with device 2 when an address is deleted from
> device 1 is going to introduce as many problems as it solves. The VRF
> use case is one example.

This already happens in IPv4:

# ip link add name dummy1 up type dummy
# ip link add name dummy2 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route add 198.51.100.0/24 dev dummy2 src 192.0.2.1
# ip -4 r s
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
198.51.100.0/24 dev dummy2 scope link src 192.0.2.1 
# ip address del 192.0.2.1/24 dev dummy1
# ip -4 r s

IPv6 only removes the preferred source address from routes, but doesn't
delete them. The patch doesn't change that.

Another difference from IPv4 is that IPv6 only removes the preferred
source address from routes whose first nexthop device matches the device
from which the address was removed:

# ip link add name dummy1 up type dummy
# ip link add name dummy2 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
# ip -6 r s
2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium
# ip address del 2001:db8:1::1/64 dev dummy1
# ip -6 r s
2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium

And this is despite the fact that the kernel only allowed the route to
be programmed because the preferred source address was present on
another interface in the same L3 domain / VRF:

# ip link add name dummy1 up type dummy
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
Error: Invalid source address.

The intent of the patch (at least with the changes I proposed) is to
remove the preferred source address from routes in a VRF when the
address is no longer configured on any interface in the VRF.

Note that the above is true for addresses with a global scope. The
removal of a link-local address from a device should not affect other
devices. This restriction also applies when a route is added:

# ip link add name dummy1 up type dummy
# ip link add name dummy2 up type dummy
# ip -6 address add fe80::1/64 dev dummy1
# ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1
Error: Invalid source address.
# ip -6 address add fe80::1/64 dev dummy2
# ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1
David Ahern July 25, 2023, 10:37 p.m. UTC | #9
On 7/25/23 4:06 AM, Ido Schimmel wrote:
> On Sun, Jul 23, 2023 at 12:12:00PM -0600, David Ahern wrote:
>> On 7/23/23 2:13 AM, Ido Schimmel wrote:
>>>
>>> I don't know, but when I checked the code and tested it I noticed that
>>> the kernel doesn't care on which interface the address is configured.
>>> Therefore, in order for deletion to be consistent with addition and with
>>> IPv4, the preferred source address shouldn't be removed from routes in
>>> the VRF table as long as the address is configured on one of the
>>> interfaces in the VRF.
>>>
>>
>> Deleting routes associated with device 2 when an address is deleted from
>> device 1 is going to introduce as many problems as it solves. The VRF
>> use case is one example.
> 
> This already happens in IPv4:
> 
> # ip link add name dummy1 up type dummy
> # ip link add name dummy2 up type dummy
> # ip address add 192.0.2.1/24 dev dummy1
> # ip route add 198.51.100.0/24 dev dummy2 src 192.0.2.1
> # ip -4 r s
> 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
> 198.51.100.0/24 dev dummy2 scope link src 192.0.2.1 
> # ip address del 192.0.2.1/24 dev dummy1
> # ip -4 r s
> 
> IPv6 only removes the preferred source address from routes, but doesn't
> delete them. The patch doesn't change that.
> 
> Another difference from IPv4 is that IPv6 only removes the preferred
> source address from routes whose first nexthop device matches the device
> from which the address was removed:
> 
> # ip link add name dummy1 up type dummy
> # ip link add name dummy2 up type dummy
> # ip address add 2001:db8:1::1/64 dev dummy1
> # ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
> # ip -6 r s
> 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
> fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> # ip address del 2001:db8:1::1/64 dev dummy1
> # ip -6 r s
> 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
> fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> 
> And this is despite the fact that the kernel only allowed the route to
> be programmed because the preferred source address was present on
> another interface in the same L3 domain / VRF:
> 
> # ip link add name dummy1 up type dummy
> # ip link add name dummy2 up type dummy
> # ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
> Error: Invalid source address.
> 
> The intent of the patch (at least with the changes I proposed) is to
> remove the preferred source address from routes in a VRF when the
> address is no longer configured on any interface in the VRF.
> 
> Note that the above is true for addresses with a global scope. The
> removal of a link-local address from a device should not affect other
> devices. This restriction also applies when a route is added:
> 
> # ip link add name dummy1 up type dummy
> # ip link add name dummy2 up type dummy
> # ip -6 address add fe80::1/64 dev dummy1
> # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1
> Error: Invalid source address.
> # ip -6 address add fe80::1/64 dev dummy2
> # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1

Lot of permutations. It would be good to get these in a test script
along with other variations - e.g.,

# 2 devices with the same source address
ip link add name dummy1 up type dummy
ip link add name dummy2 up type dummy
ip link add name dummy3 up type dummy
ip address add 192.0.2.1/24 dev dummy1
ip address add 192.0.2.1/24 dev dummy3
ip route add 198.51.100.0/24 dev dummy2 src 192.0.2.1
ip address del 192.0.2.1/24 dev dummy1
--> src route should stay

# VRF with single device using src address
ip li add name red up type vrf table 123
ip link add name dummy4 up type dummy vrf red
ip link add name dummy5 up type dummy vrf red
ip address add 192.0.2.1/24 dev dummy4
ip route add 198.51.100.0/24 dev dummy5 src 192.0.2.1
ip address del 192.0.2.1/24 dev dummy4
ip ro ls vrf red

# VRF with two devices using src address
ip li add name red up type vrf table 123
ip link add name dummy4 up vrf red type dummy
ip link add name dummy5 up vrf red type dummy
ip link add name dummy6 up vrf red type dummy
ip address add 192.0.2.1/24 dev dummy4
ip address add 192.0.2.1/24 dev dummy6
ip route add 198.51.100.0/24 dev dummy5 src 192.0.2.1 vrf red
ip address del 192.0.2.1/24 dev dummy4


I can not find my notes but I recall Donald raised a ticket at Cumulus
when FRR tripped over a scenario like this or a related one (something
about routes and address delete). CC'ed Donald in case he recalls the
details
Hangbin Liu July 26, 2023, 9:46 a.m. UTC | #10
On Tue, Jul 25, 2023 at 04:37:15PM -0600, David Ahern wrote:
> > This already happens in IPv4:
> > 
> > # ip link add name dummy1 up type dummy
> > # ip link add name dummy2 up type dummy
> > # ip address add 192.0.2.1/24 dev dummy1
> > # ip route add 198.51.100.0/24 dev dummy2 src 192.0.2.1
> > # ip -4 r s
> > 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 
> > 198.51.100.0/24 dev dummy2 scope link src 192.0.2.1 
> > # ip address del 192.0.2.1/24 dev dummy1
> > # ip -4 r s
> > 
> > IPv6 only removes the preferred source address from routes, but doesn't
> > delete them. The patch doesn't change that.
> > 
> > Another difference from IPv4 is that IPv6 only removes the preferred
> > source address from routes whose first nexthop device matches the device
> > from which the address was removed:
> > 
> > # ip link add name dummy1 up type dummy
> > # ip link add name dummy2 up type dummy
> > # ip address add 2001:db8:1::1/64 dev dummy1
> > # ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
> > # ip -6 r s
> > 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> > 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
> > fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> > fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> > # ip address del 2001:db8:1::1/64 dev dummy1
> > # ip -6 r s
> > 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 metric 1024 pref medium
> > fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> > fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> > 
> > And this is despite the fact that the kernel only allowed the route to
> > be programmed because the preferred source address was present on
> > another interface in the same L3 domain / VRF:
> > 
> > # ip link add name dummy1 up type dummy
> > # ip link add name dummy2 up type dummy
> > # ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1
> > Error: Invalid source address.
> > 
> > The intent of the patch (at least with the changes I proposed) is to
> > remove the preferred source address from routes in a VRF when the
> > address is no longer configured on any interface in the VRF.
> > 
> > Note that the above is true for addresses with a global scope. The
> > removal of a link-local address from a device should not affect other
> > devices. This restriction also applies when a route is added:
> > 
> > # ip link add name dummy1 up type dummy
> > # ip link add name dummy2 up type dummy
> > # ip -6 address add fe80::1/64 dev dummy1
> > # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1
> > Error: Invalid source address.
> > # ip -6 address add fe80::1/64 dev dummy2
> > # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1
> 
> Lot of permutations. It would be good to get these in a test script
> along with other variations - e.g.,
> 
> # 2 devices with the same source address
> ip link add name dummy1 up type dummy
> ip link add name dummy2 up type dummy
> ip link add name dummy3 up type dummy
> ip address add 192.0.2.1/24 dev dummy1
> ip address add 192.0.2.1/24 dev dummy3
> ip route add 198.51.100.0/24 dev dummy2 src 192.0.2.1
> ip address del 192.0.2.1/24 dev dummy1
> --> src route should stay
> 
> # VRF with single device using src address
> ip li add name red up type vrf table 123
> ip link add name dummy4 up type dummy vrf red
> ip link add name dummy5 up type dummy vrf red
> ip address add 192.0.2.1/24 dev dummy4
> ip route add 198.51.100.0/24 dev dummy5 src 192.0.2.1
> ip address del 192.0.2.1/24 dev dummy4
> ip ro ls vrf red
> 
> # VRF with two devices using src address
> ip li add name red up type vrf table 123
> ip link add name dummy4 up vrf red type dummy
> ip link add name dummy5 up vrf red type dummy
> ip link add name dummy6 up vrf red type dummy
> ip address add 192.0.2.1/24 dev dummy4
> ip address add 192.0.2.1/24 dev dummy6
> ip route add 198.51.100.0/24 dev dummy5 src 192.0.2.1 vrf red
> ip address del 192.0.2.1/24 dev dummy4

I can add these to fib_tests later.

Hangbin
diff mbox series

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 64e873f5895f..773285adc17c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4590,10 +4590,10 @@  static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
 	struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
 	struct net *net = ((struct arg_dev_net_ip *)arg)->net;
 	struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
+	u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
 
-	if (!rt->nh &&
-	    ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) &&
-	    rt != net->ipv6.fib6_null_entry &&
+	if (rt != net->ipv6.fib6_null_entry &&
+	    rt->fib6_table->tb6_id == tb6_id &&
 	    ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
 		spin_lock_bh(&rt6_exception_lock);
 		/* remove prefsrc entry */
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 35d89dfa6f11..bf11edc17dfc 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -9,7 +9,7 @@  ret=0
 ksft_skip=4
 
 # all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh"
+TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
@@ -1869,6 +1869,96 @@  ipv4_del_addr_test()
 	cleanup
 }
 
+ipv6_del_addr_test()
+{
+	echo
+	echo "IPv6 delete address route tests"
+
+	setup
+
+	set -e
+	$IP li add dummy1 type dummy
+	$IP li set dummy1 up
+	$IP li add dummy2 type dummy
+	$IP li set dummy2 up
+	$IP li add red type vrf table 1111
+	$IP li set red up
+	$IP ro add vrf red unreachable default
+	$IP li set dummy2 vrf red
+
+	$IP addr add dev dummy1 2001:db8:104::1/64
+	$IP addr add dev dummy1 2001:db8:104::11/64
+	$IP addr add dev dummy1 2001:db8:104::12/64
+	$IP addr add dev dummy1 2001:db8:104::13/64
+	$IP addr add dev dummy2 2001:db8:104::1/64
+	$IP addr add dev dummy2 2001:db8:104::11/64
+	$IP addr add dev dummy2 2001:db8:104::12/64
+	$IP route add 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11
+	$IP route add 2001:db8:106::/64 dev lo src 2001:db8:104::12
+	$IP route add table 0 2001:db8:107::/64 via 2001:db8:104::2 src 2001:db8:104::13
+	$IP route add vrf red 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11
+	$IP route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12
+	set +e
+
+	# removing address from device in vrf should only remove it as a
+	# preferred source address from routes in vrf table
+	echo "    Regular FIB info"
+
+	$IP addr del dev dummy2 2001:db8:104::11/64
+	# Checking if the source address exists instead of the dest subnet
+	# as IPv6 only removes the preferred source address, not whole route.
+	$IP -6 ro ls vrf red | grep -q "src 2001:db8:104::11"
+	log_test $? 1 "Prefsrc removed from VRF when source address deleted"
+
+	$IP -6 ro ls | grep -q " src 2001:db8:104::11"
+	log_test $? 0 "Prefsrc in default VRF not removed"
+
+	$IP addr add dev dummy2 2001:db8:104::11/64
+	$IP route replace vrf red 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11
+
+	$IP addr del dev dummy1 2001:db8:104::11/64
+	$IP -6 ro ls | grep -q "src 2001:db8:104::11"
+	log_test $? 1 "Prefsrc removed in default VRF when source address deleted"
+
+	$IP -6 ro ls vrf red | grep -q "src 2001:db8:104::11"
+	log_test $? 0 "Prefsrc in VRF is not removed by address delete"
+
+	# removing address from device in vrf should only remove preferred
+	# source address from vrf table even when the associated fib info
+	# only differs in table ID
+	echo "    Identical FIB info with different table ID"
+
+	$IP addr del dev dummy2 2001:db8:104::12/64
+	$IP -6 ro ls vrf red | grep -q "src 2001:db8:104::12"
+	log_test $? 1 "Prefsrc removed from VRF when source address deleted"
+
+	$IP -6 ro ls | grep -q "src 2001:db8:104::12"
+	log_test $? 0 "Prefsrc in default VRF not removed"
+
+	$IP addr add dev dummy2 2001:db8:104::12/64
+	$IP route replace vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12
+
+	$IP addr del dev dummy1 2001:db8:104::12/64
+	$IP -6 ro ls | grep -q "src 2001:db8:104::12"
+	log_test $? 1 "Prefsrc removed in default VRF when source address deleted"
+
+	$IP -6 ro ls vrf red | grep -q "src 2001:db8:104::12"
+	log_test $? 0 "Prefsrc in VRF is not removed by address delete"
+
+	# removing address from device in default vrf should remove preferred
+	# source address from the default vrf even when route was inserted
+	# with a table ID of 0.
+	echo "    Table ID 0"
+
+	$IP addr del dev dummy1 2001:db8:104::13/64
+	$IP -6 ro ls | grep -q "src 2001:db8:104::13"
+	log_test $? 1 "Prefsrc removed in default VRF when source address deleted"
+
+	$IP li del dummy1
+	$IP li del dummy2
+	cleanup
+}
+
 
 ipv4_route_v6_gw_test()
 {
@@ -2211,6 +2301,7 @@  do
 	ipv6_addr_metric)		ipv6_addr_metric_test;;
 	ipv4_addr_metric)		ipv4_addr_metric_test;;
 	ipv4_del_addr)			ipv4_del_addr_test;;
+	ipv6_del_addr)			ipv6_del_addr_test;;
 	ipv6_route_metrics)		ipv6_route_metrics_test;;
 	ipv4_route_metrics)		ipv4_route_metrics_test;;
 	ipv4_route_v6_gw)		ipv4_route_v6_gw_test;;