mbox series

[RFC,net-next,v1,0/3] net: route: improve route hinting

Message ID d951b371-4138-4bda-a1c5-7606a28c81f0@gmail.com (mailing list archive)
Headers show
Series net: route: improve route hinting | expand

Message

Leone Fernando Jan. 25, 2024, 1:08 p.m. UTC
In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
sub-system. The hinting optimization improves performance by reusing
previously found dsts instead of looking them up for each skb.

This RFC introduces a generalized version of the hinting mechanism that
can "remember" a larger number of dsts. This reduces the number of dst
lookups for frequently encountered daddrs.

Before diving into the code and the benchmarking results, it's important
to address the deletion of the old route cache [2] and why
this solution is different. The original cache was complicated,
vulnerable to DOS attacks and had unstable performance.

The new input dst_cache is much simpler thanks to its lazy approach,
improving performance without the overhead of the removed cache
implementation. Instead of using timers and GC, the deletion of invalid
entries is performed lazily during their lookups.
The dsts are stored in a simple, lightweight, static hash table. This
keeps the lookup times fast yet stable, preventing DOS upon cache misses.
The new input dst_cache implementation is built over the existing
dst_cache code which supplies a fast lockless percpu behavior.
I also plan to add a sysctl setting to provide finer tuning of the
cache size when needed (not implemented in this RFC).

I tested this patch using udp floods with different number of daddrs.
The benchmarking setup is comprised of 3 machines: a sender,
a forwarder and a receiver. I measured the PPS received by the receiver
as the forwarder was running either the mainline kernel or the patched
kernel, comparing the results. The dst_cache I tested in this benchmark
used a total of 512 hash table entries, split into buckets of 4
entries each.

These are the results:
  UDP             mainline              patched                   delta
conns pcpu         Kpps                  Kpps                       %
   1              274.0255              269.2205                  -1.75
   2              257.3748              268.0947                   4.17
  15              241.3513              258.8016                   7.23
 100              238.3419              258.4939                   8.46
 500              238.5390              252.6425                   5.91
1000              238.7570              242.1820                   1.43
2000              238.7780              236.2640                  -1.05
4000              239.0440              233.5320                  -2.31
8000              239.3248              232.5680                  -2.82

As you can see, this patch improves performance up until ~1500
connections, after which the rate of improvement diminishes
due to the growing number of cache misses.
It's important to note that in the worst scenario, every packet will
cause a cache miss, resulting in only a constant performance degradation
due to the fixed cache and bucket sizes. This means that the cache is
resistant to DOS attacks.

Based on the above measurements, it seems that the performance
degradation flattens at around 3%. Note that the number of concurrent
connections at which performance starts to degrade depends on the cache
size and the amount of cpus.

I would love to get your opinion on the following:
    - What would be a good default size for the cache? This depends on
      the number of daddrs the machine is expected to handle. Which kind
      of setup should we optimize for?

    - A possible improvement for machines that are expected to handle a
      large number of daddrs is to turn off the cache after a threshold
      of cache misses has been reached. The cache can then be turned on
      again after some period of time.

Do you have any other ideas or suggestions?

Another problem I encountered is that if an skb finds its dst in the
dst_cache, it doesn't update its skb->flags during the routing process,
e.g., IPSKB_NOPOLICY and IPSKB_DOREDIRECT.
This can be fixed by moving the IPSKB_DOREDIRECT update to ip_forward.
The IPSKB_NOPOLICY flag is set in mkroute_input, local_input and
multicast, so maybe we can just move this logic to the end
of ip_rcv_finish_core.

What do you think? Do you have a better idea?

[1] https://lore.kernel.org/netdev/cover.1574252982.git.pabeni@redhat.com/
[2] https://lore.kernel.org/netdev/20120720.142502.1144557295933737451.davem@davemloft.net/

Leone Fernando (3):
  net: route: expire rt if the dst it holds is expired
  net: dst_cache: add input_dst_cache API
  net: route: replace route hints with input_dst_cache

 include/linux/percpu.h  |   4 ++
 include/net/dst_cache.h |  56 ++++++++++++++++
 include/net/route.h     |   6 +-
 net/core/dst_cache.c    | 145 ++++++++++++++++++++++++++++++++++++++++
 net/ipv4/ip_input.c     |  58 ++++++++--------
 net/ipv4/route.c        |  39 ++++++++---
 6 files changed, 268 insertions(+), 40 deletions(-)

--
2.34.1