mbox series

[net-next,v2,0/5] Allow configuration of multipath hash seed

Message ID 20240607151357.421181-1-petrm@nvidia.com (mailing list archive)
Headers show
Series Allow configuration of multipath hash seed | expand

Message

Petr Machata June 7, 2024, 3:13 p.m. UTC
Let me just quote the commit message of patch #2 here to inform the
motivation and some of the implementation:

    When calculating hashes for the purpose of multipath forwarding,
    both IPv4 and IPv6 code currently fall back on
    flow_hash_from_keys(). That uses a randomly-generated seed. That's a
    fine choice by default, but unfortunately some deployments may need
    a tighter control over the seed used.

    In this patchset, make the seed configurable by adding a new sysctl
    key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
    is used specifically for multipath forwarding and not for the other
    concerns that flow_hash_from_keys() is used for, such as queue
    selection. Expose the knob as sysctl because other such settings,
    such as headers to hash, are also handled that way.

    Despite being placed in the net.ipv4 namespace, the multipath seed
    sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
    TCP variables. Like those, the multipath hash seed is a per-netns
    variable.

    The seed used by flow_hash_from_keys() is a 128-bit quantity.
    However it seems that usually the seed is a much more modest value.
    32 bits seem typical (Cisco, Cumulus), some systems go even lower.
    For that reason, and to decouple the user interface from
    implementation details, go with a 32-bit quantity, which is then
    quadruplicated to form the siphash key.

One example of use of this interface is avoiding hash polarization,
where two ECMP routers, one behind the other, happen to make consistent
hashing decisions, and as a result, part of the ECMP space of the latter
router is never used. Another is a load balancer where several machines
forward traffic to one of a number of leaves, and the forwarding
decisions need to be made consistently. (This is a case of a desired
hash polarization, mentioned e.g. in chapter 6.3 of [0].)

There has already been a proposal to include a hash seed control
interface in the past[1].

- Patches #1-#2 contain the substance of the work
- Patch #3 is an mlxsw offload
- Patches #4 and #5 are a selftest

[0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
[1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/

v2:
- Patch #2:
    - Instead of precomputing the siphash key, construct it in place
      of use thus obviating the need to use RCU.
    - Instead of dispatching to the flow dissector for cases where
      user seed is 0, maintain a separate random seed. Initialize it
      early so that we can avoid a branch at the seed reader.
    - In documentation, s/only valid/only present/ (when
      CONFIG_IP_ROUTE_MULTIPATH). Also mention the algorithm is
      unspecified and unstable in principle.
- Patch #3:
    - Update to match changes in patch #2.
- Patch #4:
    - New patch.
- Patch #5:
    - Do not set seed on test init and run the stability tests first to catch
      the cases of a missed pernet seed init.

Petr Machata (5):
  net: ipv4,ipv6: Pass multipath hash computation through a helper
  net: ipv4: Add a sysctl to set multipath hash seed
  mlxsw: spectrum_router: Apply user-defined multipath hash seed
  selftests: forwarding: lib: Split sysctl_save() out of sysctl_set()
  selftests: forwarding: router_mpath_hash: Add a new selftest

 Documentation/networking/ip-sysctl.rst        |  14 +
 .../ethernet/mellanox/mlxsw/spectrum_router.c |   6 +-
 include/net/flow_dissector.h                  |   2 +
 include/net/ip_fib.h                          |  28 ++
 include/net/netns/ipv4.h                      |   8 +
 net/core/flow_dissector.c                     |   7 +
 net/ipv4/route.c                              |  12 +-
 net/ipv4/sysctl_net_ipv4.c                    |  66 ++++
 net/ipv6/route.c                              |  12 +-
 .../testing/selftests/net/forwarding/Makefile |   1 +
 tools/testing/selftests/net/forwarding/lib.sh |   9 +-
 .../net/forwarding/router_mpath_seed.sh       | 333 ++++++++++++++++++
 12 files changed, 484 insertions(+), 14 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh

Comments

Nikolay Aleksandrov June 8, 2024, 10:39 a.m. UTC | #1
On 6/7/24 18:13, Petr Machata wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
> 
>     When calculating hashes for the purpose of multipath forwarding,
>     both IPv4 and IPv6 code currently fall back on
>     flow_hash_from_keys(). That uses a randomly-generated seed. That's a
>     fine choice by default, but unfortunately some deployments may need
>     a tighter control over the seed used.
> 
>     In this patchset, make the seed configurable by adding a new sysctl
>     key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
>     is used specifically for multipath forwarding and not for the other
>     concerns that flow_hash_from_keys() is used for, such as queue
>     selection. Expose the knob as sysctl because other such settings,
>     such as headers to hash, are also handled that way.
> 
>     Despite being placed in the net.ipv4 namespace, the multipath seed
>     sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
>     TCP variables. Like those, the multipath hash seed is a per-netns
>     variable.
> 
>     The seed used by flow_hash_from_keys() is a 128-bit quantity.
>     However it seems that usually the seed is a much more modest value.
>     32 bits seem typical (Cisco, Cumulus), some systems go even lower.
>     For that reason, and to decouple the user interface from
>     implementation details, go with a 32-bit quantity, which is then
>     quadruplicated to form the siphash key.
> 
> One example of use of this interface is avoiding hash polarization,
> where two ECMP routers, one behind the other, happen to make consistent
> hashing decisions, and as a result, part of the ECMP space of the latter
> router is never used. Another is a load balancer where several machines
> forward traffic to one of a number of leaves, and the forwarding
> decisions need to be made consistently. (This is a case of a desired
> hash polarization, mentioned e.g. in chapter 6.3 of [0].)
> 
> There has already been a proposal to include a hash seed control
> interface in the past[1].
> 
> - Patches #1-#2 contain the substance of the work
> - Patch #3 is an mlxsw offload
> - Patches #4 and #5 are a selftest
> 
> [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
> [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
> 
> v2:
> - Patch #2:
>     - Instead of precomputing the siphash key, construct it in place
>       of use thus obviating the need to use RCU.
>     - Instead of dispatching to the flow dissector for cases where
>       user seed is 0, maintain a separate random seed. Initialize it
>       early so that we can avoid a branch at the seed reader.
>     - In documentation, s/only valid/only present/ (when
>       CONFIG_IP_ROUTE_MULTIPATH). Also mention the algorithm is
>       unspecified and unstable in principle.
> - Patch #3:
>     - Update to match changes in patch #2.
> - Patch #4:
>     - New patch.
> - Patch #5:
>     - Do not set seed on test init and run the stability tests first to catch
>       the cases of a missed pernet seed init.
> 
> Petr Machata (5):
>   net: ipv4,ipv6: Pass multipath hash computation through a helper
>   net: ipv4: Add a sysctl to set multipath hash seed
>   mlxsw: spectrum_router: Apply user-defined multipath hash seed
>   selftests: forwarding: lib: Split sysctl_save() out of sysctl_set()
>   selftests: forwarding: router_mpath_hash: Add a new selftest
> 
>  Documentation/networking/ip-sysctl.rst        |  14 +
>  .../ethernet/mellanox/mlxsw/spectrum_router.c |   6 +-
>  include/net/flow_dissector.h                  |   2 +
>  include/net/ip_fib.h                          |  28 ++
>  include/net/netns/ipv4.h                      |   8 +
>  net/core/flow_dissector.c                     |   7 +
>  net/ipv4/route.c                              |  12 +-
>  net/ipv4/sysctl_net_ipv4.c                    |  66 ++++
>  net/ipv6/route.c                              |  12 +-
>  .../testing/selftests/net/forwarding/Makefile |   1 +
>  tools/testing/selftests/net/forwarding/lib.sh |   9 +-
>  .../net/forwarding/router_mpath_seed.sh       | 333 ++++++++++++++++++
>  12 files changed, 484 insertions(+), 14 deletions(-)
>  create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
> 

Looks good to me, thank you!
For the set,
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
David Ahern June 10, 2024, 3:21 p.m. UTC | #2
On 6/7/24 9:13 AM, Petr Machata wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
> 
>     When calculating hashes for the purpose of multipath forwarding,
>     both IPv4 and IPv6 code currently fall back on
>     flow_hash_from_keys(). That uses a randomly-generated seed. That's a
>     fine choice by default, but unfortunately some deployments may need
>     a tighter control over the seed used.
> 
>     In this patchset, make the seed configurable by adding a new sysctl
>     key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
>     is used specifically for multipath forwarding and not for the other
>     concerns that flow_hash_from_keys() is used for, such as queue
>     selection. Expose the knob as sysctl because other such settings,
>     such as headers to hash, are also handled that way.
> 
>     Despite being placed in the net.ipv4 namespace, the multipath seed
>     sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
>     TCP variables. Like those, the multipath hash seed is a per-netns
>     variable.
> 
>     The seed used by flow_hash_from_keys() is a 128-bit quantity.
>     However it seems that usually the seed is a much more modest value.
>     32 bits seem typical (Cisco, Cumulus), some systems go even lower.
>     For that reason, and to decouple the user interface from
>     implementation details, go with a 32-bit quantity, which is then
>     quadruplicated to form the siphash key.
> 
> One example of use of this interface is avoiding hash polarization,
> where two ECMP routers, one behind the other, happen to make consistent
> hashing decisions, and as a result, part of the ECMP space of the latter
> router is never used. Another is a load balancer where several machines
> forward traffic to one of a number of leaves, and the forwarding
> decisions need to be made consistently. (This is a case of a desired
> hash polarization, mentioned e.g. in chapter 6.3 of [0].)
> 
> There has already been a proposal to include a hash seed control
> interface in the past[1].
> 
> - Patches #1-#2 contain the substance of the work
> - Patch #3 is an mlxsw offload
> - Patches #4 and #5 are a selftest
> 
> [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
> [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
> 
> v2:
> - Patch #2:
>     - Instead of precomputing the siphash key, construct it in place
>       of use thus obviating the need to use RCU.
>     - Instead of dispatching to the flow dissector for cases where
>       user seed is 0, maintain a separate random seed. Initialize it
>       early so that we can avoid a branch at the seed reader.
>     - In documentation, s/only valid/only present/ (when
>       CONFIG_IP_ROUTE_MULTIPATH). Also mention the algorithm is
>       unspecified and unstable in principle.
> - Patch #3:
>     - Update to match changes in patch #2.
> - Patch #4:
>     - New patch.
> - Patch #5:
>     - Do not set seed on test init and run the stability tests first to catch
>       the cases of a missed pernet seed init.
> 
> Petr Machata (5):
>   net: ipv4,ipv6: Pass multipath hash computation through a helper
>   net: ipv4: Add a sysctl to set multipath hash seed
>   mlxsw: spectrum_router: Apply user-defined multipath hash seed
>   selftests: forwarding: lib: Split sysctl_save() out of sysctl_set()
>   selftests: forwarding: router_mpath_hash: Add a new selftest
> 
>  Documentation/networking/ip-sysctl.rst        |  14 +
>  .../ethernet/mellanox/mlxsw/spectrum_router.c |   6 +-
>  include/net/flow_dissector.h                  |   2 +
>  include/net/ip_fib.h                          |  28 ++
>  include/net/netns/ipv4.h                      |   8 +
>  net/core/flow_dissector.c                     |   7 +
>  net/ipv4/route.c                              |  12 +-
>  net/ipv4/sysctl_net_ipv4.c                    |  66 ++++
>  net/ipv6/route.c                              |  12 +-
>  .../testing/selftests/net/forwarding/Makefile |   1 +
>  tools/testing/selftests/net/forwarding/lib.sh |   9 +-
>  .../net/forwarding/router_mpath_seed.sh       | 333 ++++++++++++++++++
>  12 files changed, 484 insertions(+), 14 deletions(-)
>  create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
> 

For the set:

Reviewed-by: David Ahern <dsahern@kernel.org>
patchwork-bot+netdevbpf@kernel.org June 13, 2024, 12:50 a.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 7 Jun 2024 17:13:52 +0200 you wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
> 
>     When calculating hashes for the purpose of multipath forwarding,
>     both IPv4 and IPv6 code currently fall back on
>     flow_hash_from_keys(). That uses a randomly-generated seed. That's a
>     fine choice by default, but unfortunately some deployments may need
>     a tighter control over the seed used.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/5] net: ipv4,ipv6: Pass multipath hash computation through a helper
    https://git.kernel.org/netdev/net-next/c/3e453ca122d4
  - [net-next,v2,2/5] net: ipv4: Add a sysctl to set multipath hash seed
    https://git.kernel.org/netdev/net-next/c/4ee2a8cace3f
  - [net-next,v2,3/5] mlxsw: spectrum_router: Apply user-defined multipath hash seed
    https://git.kernel.org/netdev/net-next/c/60bcfede3f9f
  - [net-next,v2,4/5] selftests: forwarding: lib: Split sysctl_save() out of sysctl_set()
    https://git.kernel.org/netdev/net-next/c/6f51aed38a4f
  - [net-next,v2,5/5] selftests: forwarding: router_mpath_hash: Add a new selftest
    https://git.kernel.org/netdev/net-next/c/5f90d93b6108

You are awesome, thank you!