diff mbox series

[net,2/2] ipv6: Do not consider link down nexthops in path selection

Message ID 20250402114224.293392-3-idosch@nvidia.com (mailing list archive)
State Accepted
Commit 8b8e0dd357165e0258d9f9cdab5366720ed2f619
Delegated to: Netdev Maintainers
Headers show
Series ipv6: Multipath routing fixes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: kuba@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-04-04--00-00 (tests: 911)

Commit Message

Ido Schimmel April 2, 2025, 11:42 a.m. UTC
Nexthops whose link is down are not supposed to be considered during
path selection when the "ignore_routes_with_linkdown" sysctl is set.
This is done by assigning them a negative region boundary.

However, when comparing the computed hash (unsigned) with the region
boundary (signed), the negative region boundary is treated as unsigned,
resulting in incorrect nexthop selection.

Fix by treating the computed hash as signed. Note that the computed hash
is always in range of [0, 2^31 - 1].

Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv6/route.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn April 4, 2025, 1:22 p.m. UTC | #1
Ido Schimmel wrote:
> Nexthops whose link is down are not supposed to be considered during
> path selection when the "ignore_routes_with_linkdown" sysctl is set.
> This is done by assigning them a negative region boundary.
> 
> However, when comparing the computed hash (unsigned) with the region
> boundary (signed), the negative region boundary is treated as unsigned,
> resulting in incorrect nexthop selection.
> 
> Fix by treating the computed hash as signed. Note that the computed hash
> is always in range of [0, 2^31 - 1].
> 
> Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/ipv6/route.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 864f0002034b..ab12b816ab94 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
>  {
>  	struct fib6_info *first, *match = res->f6i;
>  	struct fib6_info *sibling;
> +	int hash;
>  
>  	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
>  		goto out;
> @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
>  	if (!first)
>  		goto out;
>  
> -	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> +	hash = fl6->mp_hash;
> +	if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&

The combined upper bounds add up to the total weights of the paths.

Should hash be scaled (using reciprocal_scale) to that bound to have
a uniform random distribution across all weights?

Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the
total weights range.

>  	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
>  			    strict) >= 0) {
>  		match = first;
> @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
>  		int nh_upper_bound;
>  
>  		nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
> -		if (fl6->mp_hash > nh_upper_bound)
> +		if (hash > nh_upper_bound)
>  			continue;
>  		if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
>  			break;
> -- 
> 2.49.0
>
Willem de Bruijn April 4, 2025, 2:03 p.m. UTC | #2
Willem de Bruijn wrote:
> Ido Schimmel wrote:
> > Nexthops whose link is down are not supposed to be considered during
> > path selection when the "ignore_routes_with_linkdown" sysctl is set.
> > This is done by assigning them a negative region boundary.
> > 
> > However, when comparing the computed hash (unsigned) with the region
> > boundary (signed), the negative region boundary is treated as unsigned,
> > resulting in incorrect nexthop selection.
> > 
> > Fix by treating the computed hash as signed. Note that the computed hash
> > is always in range of [0, 2^31 - 1].
> > 
> > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
> > Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> > ---
> >  net/ipv6/route.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 864f0002034b..ab12b816ab94 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> >  {
> >  	struct fib6_info *first, *match = res->f6i;
> >  	struct fib6_info *sibling;
> > +	int hash;
> >  
> >  	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> >  		goto out;
> > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> >  	if (!first)
> >  		goto out;
> >  
> > -	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> > +	hash = fl6->mp_hash;
> > +	if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> 
> The combined upper bounds add up to the total weights of the paths.
> 
> Should hash be scaled (using reciprocal_scale) to that bound to have
> a uniform random distribution across all weights?
> 
> Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the
> total weights range.

Never mind, the scaling is handled in rt6_upper_bound_set. Where
weights are scaled to cover the [0, INT_MAX - 1] range.

I confused fib_nh_weight with fib_nh_upper_bound.

But should U32 hash then be truncated to the lower 31 bits, to
drop the sign and negative half of the space when used as int?

> >  	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> >  			    strict) >= 0) {
> >  		match = first;
> > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> >  		int nh_upper_bound;
> >  
> >  		nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
> > -		if (fl6->mp_hash > nh_upper_bound)
> > +		if (hash > nh_upper_bound)
> >  			continue;
> >  		if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
> >  			break;
> > -- 
> > 2.49.0
> > 
> 
>
Willem de Bruijn April 4, 2025, 2:07 p.m. UTC | #3
Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Ido Schimmel wrote:
> > > Nexthops whose link is down are not supposed to be considered during
> > > path selection when the "ignore_routes_with_linkdown" sysctl is set.
> > > This is done by assigning them a negative region boundary.
> > > 
> > > However, when comparing the computed hash (unsigned) with the region
> > > boundary (signed), the negative region boundary is treated as unsigned,
> > > resulting in incorrect nexthop selection.
> > > 
> > > Fix by treating the computed hash as signed. Note that the computed hash
> > > is always in range of [0, 2^31 - 1].
> > > 
> > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
> > > Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> > > ---
> > >  net/ipv6/route.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > > index 864f0002034b..ab12b816ab94 100644
> > > --- a/net/ipv6/route.c
> > > +++ b/net/ipv6/route.c
> > > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  {
> > >  	struct fib6_info *first, *match = res->f6i;
> > >  	struct fib6_info *sibling;
> > > +	int hash;
> > >  
> > >  	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> > >  		goto out;
> > > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  	if (!first)
> > >  		goto out;
> > >  
> > > -	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> > > +	hash = fl6->mp_hash;
> > > +	if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> > 
> > The combined upper bounds add up to the total weights of the paths.
> > 
> > Should hash be scaled (using reciprocal_scale) to that bound to have
> > a uniform random distribution across all weights?
> > 
> > Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the
> > total weights range.
> 
> Never mind, the scaling is handled in rt6_upper_bound_set. Where
> weights are scaled to cover the [0, INT_MAX - 1] range.
> 
> I confused fib_nh_weight with fib_nh_upper_bound.
> 
> But should U32 hash then be truncated to the lower 31 bits, to
> drop the sign and negative half of the space when used as int?

And you document this in the commit message: "Note that the computed
hash is always in range of [0, 2^31 - 1]".

That is the `mhash >> 1` at the bottom of rt6_multipath_hash.

Sorry, I'm a bit slow in internalizing this code. And perhaps a bit
too fast at responding ;) But got it now!

 
> > >  	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> > >  			    strict) >= 0) {
> > >  		match = first;
> > > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  		int nh_upper_bound;
> > >  
> > >  		nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
> > > -		if (fl6->mp_hash > nh_upper_bound)
> > > +		if (hash > nh_upper_bound)
> > >  			continue;
> > >  		if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
> > >  			break;
> > > -- 
> > > 2.49.0
> > > 
> > 
> > 
> 
>
diff mbox series

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 864f0002034b..ab12b816ab94 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -442,6 +442,7 @@  void fib6_select_path(const struct net *net, struct fib6_result *res,
 {
 	struct fib6_info *first, *match = res->f6i;
 	struct fib6_info *sibling;
+	int hash;
 
 	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
 		goto out;
@@ -468,7 +469,8 @@  void fib6_select_path(const struct net *net, struct fib6_result *res,
 	if (!first)
 		goto out;
 
-	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
+	hash = fl6->mp_hash;
+	if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
 	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
 			    strict) >= 0) {
 		match = first;
@@ -481,7 +483,7 @@  void fib6_select_path(const struct net *net, struct fib6_result *res,
 		int nh_upper_bound;
 
 		nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
-		if (fl6->mp_hash > nh_upper_bound)
+		if (hash > nh_upper_bound)
 			continue;
 		if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
 			break;