Message ID | 20250402114224.293392-3-idosch@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8b8e0dd357165e0258d9f9cdab5366720ed2f619 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ipv6: Multipath routing fixes | expand |
Ido Schimmel wrote: > Nexthops whose link is down are not supposed to be considered during > path selection when the "ignore_routes_with_linkdown" sysctl is set. > This is done by assigning them a negative region boundary. > > However, when comparing the computed hash (unsigned) with the region > boundary (signed), the negative region boundary is treated as unsigned, > resulting in incorrect nexthop selection. > > Fix by treating the computed hash as signed. Note that the computed hash > is always in range of [0, 2^31 - 1]. > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > net/ipv6/route.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 864f0002034b..ab12b816ab94 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > { > struct fib6_info *first, *match = res->f6i; > struct fib6_info *sibling; > + int hash; > > if (!match->nh && (!match->fib6_nsiblings || have_oif_match)) > goto out; > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > if (!first) > goto out; > > - if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > + hash = fl6->mp_hash; > + if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && The combined upper bounds add up to the total weights of the paths. Should hash be scaled (using reciprocal_scale) to that bound to have a uniform random distribution across all weights? Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the total weights range. > rt6_score_route(first->fib6_nh, first->fib6_flags, oif, > strict) >= 0) { > match = first; > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > int nh_upper_bound; > > nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound); > - if (fl6->mp_hash > nh_upper_bound) > + if (hash > nh_upper_bound) > continue; > if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0) > break; > -- > 2.49.0 >
Willem de Bruijn wrote: > Ido Schimmel wrote: > > Nexthops whose link is down are not supposed to be considered during > > path selection when the "ignore_routes_with_linkdown" sysctl is set. > > This is done by assigning them a negative region boundary. > > > > However, when comparing the computed hash (unsigned) with the region > > boundary (signed), the negative region boundary is treated as unsigned, > > resulting in incorrect nexthop selection. > > > > Fix by treating the computed hash as signed. Note that the computed hash > > is always in range of [0, 2^31 - 1]. > > > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") > > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > > --- > > net/ipv6/route.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 864f0002034b..ab12b816ab94 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > { > > struct fib6_info *first, *match = res->f6i; > > struct fib6_info *sibling; > > + int hash; > > > > if (!match->nh && (!match->fib6_nsiblings || have_oif_match)) > > goto out; > > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > if (!first) > > goto out; > > > > - if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > > + hash = fl6->mp_hash; > > + if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > > The combined upper bounds add up to the total weights of the paths. > > Should hash be scaled (using reciprocal_scale) to that bound to have > a uniform random distribution across all weights? > > Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the > total weights range. Never mind, the scaling is handled in rt6_upper_bound_set. Where weights are scaled to cover the [0, INT_MAX - 1] range. I confused fib_nh_weight with fib_nh_upper_bound. But should U32 hash then be truncated to the lower 31 bits, to drop the sign and negative half of the space when used as int? > > rt6_score_route(first->fib6_nh, first->fib6_flags, oif, > > strict) >= 0) { > > match = first; > > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > int nh_upper_bound; > > > > nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound); > > - if (fl6->mp_hash > nh_upper_bound) > > + if (hash > nh_upper_bound) > > continue; > > if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0) > > break; > > -- > > 2.49.0 > > > >
Willem de Bruijn wrote: > Willem de Bruijn wrote: > > Ido Schimmel wrote: > > > Nexthops whose link is down are not supposed to be considered during > > > path selection when the "ignore_routes_with_linkdown" sysctl is set. > > > This is done by assigning them a negative region boundary. > > > > > > However, when comparing the computed hash (unsigned) with the region > > > boundary (signed), the negative region boundary is treated as unsigned, > > > resulting in incorrect nexthop selection. > > > > > > Fix by treating the computed hash as signed. Note that the computed hash > > > is always in range of [0, 2^31 - 1]. > > > > > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") > > > Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> > > > --- > > > net/ipv6/route.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > > index 864f0002034b..ab12b816ab94 100644 > > > --- a/net/ipv6/route.c > > > +++ b/net/ipv6/route.c > > > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > > { > > > struct fib6_info *first, *match = res->f6i; > > > struct fib6_info *sibling; > > > + int hash; > > > > > > if (!match->nh && (!match->fib6_nsiblings || have_oif_match)) > > > goto out; > > > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > > if (!first) > > > goto out; > > > > > > - if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > > > + hash = fl6->mp_hash; > > > + if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > > > > The combined upper bounds add up to the total weights of the paths. > > > > Should hash be scaled (using reciprocal_scale) to that bound to have > > a uniform random distribution across all weights? > > > > Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the > > total weights range. > > Never mind, the scaling is handled in rt6_upper_bound_set. Where > weights are scaled to cover the [0, INT_MAX - 1] range. > > I confused fib_nh_weight with fib_nh_upper_bound. > > But should U32 hash then be truncated to the lower 31 bits, to > drop the sign and negative half of the space when used as int? And you document this in the commit message: "Note that the computed hash is always in range of [0, 2^31 - 1]". That is the `mhash >> 1` at the bottom of rt6_multipath_hash. Sorry, I'm a bit slow in internalizing this code. And perhaps a bit too fast at responding ;) But got it now! > > > rt6_score_route(first->fib6_nh, first->fib6_flags, oif, > > > strict) >= 0) { > > > match = first; > > > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > > > int nh_upper_bound; > > > > > > nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound); > > > - if (fl6->mp_hash > nh_upper_bound) > > > + if (hash > nh_upper_bound) > > > continue; > > > if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0) > > > break; > > > -- > > > 2.49.0 > > > > > > > > >
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 864f0002034b..ab12b816ab94 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, { struct fib6_info *first, *match = res->f6i; struct fib6_info *sibling; + int hash; if (!match->nh && (!match->fib6_nsiblings || have_oif_match)) goto out; @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, if (!first) goto out; - if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && + hash = fl6->mp_hash; + if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && rt6_score_route(first->fib6_nh, first->fib6_flags, oif, strict) >= 0) { match = first; @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, int nh_upper_bound; nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound); - if (fl6->mp_hash > nh_upper_bound) + if (hash > nh_upper_bound) continue; if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0) break;
Nexthops whose link is down are not supposed to be considered during path selection when the "ignore_routes_with_linkdown" sysctl is set. This is done by assigning them a negative region boundary. However, when comparing the computed hash (unsigned) with the region boundary (signed), the negative region boundary is treated as unsigned, resulting in incorrect nexthop selection. Fix by treating the computed hash as signed. Note that the computed hash is always in range of [0, 2^31 - 1]. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Signed-off-by: Ido Schimmel <idosch@nvidia.com> --- net/ipv6/route.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)