Message ID | 20240307171202.232684-1-leone4fernando@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: route: improve route hinting | expand |
On 3/7/24 10:11 AM, Leone Fernando wrote: > In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing > sub-system. The hinting optimization improves performance by reusing > previously found dsts instead of looking them up for each skb. > > This patch series introduces a generalized version of the hinting mechanism that > can "remember" a larger number of dsts. This reduces the number of dst > lookups for frequently encountered daddrs. > > Before diving into the code and the benchmarking results, it's important > to address the deletion of the old route cache [2] and why > this solution is different. The original cache was complicated, > vulnerable to DOS attacks and had unstable performance. > > The new input dst_cache is much simpler thanks to its lazy approach, > improving performance without the overhead of the removed cache > implementation. Instead of using timers and GC, the deletion of invalid > entries is performed lazily during their lookups. > The dsts are stored in a simple, lightweight, static hash table. This > keeps the lookup times fast yet stable, preventing DOS upon cache misses. > The new input dst_cache implementation is built over the existing > dst_cache code which supplies a fast lockless percpu behavior. > > I tested this patch using udp floods with different number of daddrs. > The benchmarking setup is comprised of 3 machines: a sender, > a forwarder and a receiver. I measured the PPS received by the receiver > as the forwarder was running either the mainline kernel or the patched > kernel, comparing the results. The dst_cache I tested in this benchmark > used a total of 512 hash table entries, split into buckets of 4 > entries each. > > These are the results: > UDP mainline patched delta > conns pcpu Kpps Kpps % > 1 274.0255 269.2205 -1.75 > 2 257.3748 268.0947 4.17 > 15 241.3513 258.8016 7.23 > 100 238.3419 258.4939 8.46 > 500 238.5390 252.6425 5.91 > 1000 238.7570 242.1820 1.43 > 2000 238.7780 236.2640 -1.05 > 4000 239.0440 233.5320 -2.31 > 8000 239.3248 232.5680 -2.82 > I have looked at all of the sets sent. I can not convince myself this is a good idea, but at the same time I do not have constructive feedback on why it is not acceptable. The gains are modest at best.
David Ahern wrote: > I have looked at all of the sets sent. I can not convince myself this is > a good idea, but at the same time I do not have constructive feedback on > why it is not acceptable. The gains are modest at best. > Thanks for the comment. I believe an improvement of 5-8% in PPS is significant. Note that the cache is per-cpu (e.g., for a machine with 10 CPUs, the improvement affects 10X the conns mentioned). Could you please provide more information about what you don't like in the patch? Some possible issues I can think of: - Do you think the improvement is not affecting the common case? In this case, it can be solved by tweaking the cache parameters. - In case performance degradation for some # of conns is problematic - we can find ways to reduce it. For example, the degradation for 1 connection can probably be solved by keeping the route hints. Leone
Hi David, I plan to continue working on this patch, and it would be helpful if you could share some more thoughts. As I said before, I think this patch is significant, and my measurements showed a consistent improvement in most cases. Thanks, Leone
On 4/2/24 4:08 AM, Leone Fernando wrote: > Hi David, > > I plan to continue working on this patch, and it > would be helpful if you could share some more thoughts. > As I said before, I think this patch is significant, and my measurements > showed a consistent improvement in most cases. > It seems to me patch 1 and a version of it for IPv6 should go in independent of this set. For the rest of it, Jakub's response was a good summary: it is hard to know if there is a benefit to real workloads. Cache's consume resources (memory and cpu) and will be wrong some percentage of the time increasing overhead. Also, it is targeted at a very narrow use case -- IPv4 only, no custom FIB rules and no multipath.
Hi David, > > For the rest of it, Jakub's response was a good summary: it is hard to > know if there is a benefit to real workloads. Cache's consume resources > (memory and cpu) and will be wrong some percentage of the time > increasing overhead. > I understand what you are saying here. Do you have an idea for extra tests or measurements to make sure the patch also improves in real workloads? > Also, it is targeted at a very narrow use case -- IPv4 only, no custom > FIB rules and no multipath. Implementation for IPv6 is almost identical. I decided to start with IPv4 and I plan to submit IPv6 support in a future patch. This patch basically improves the hinting mechanism that Paolo introduced and it handles the same cases. Adding support for FIB rules and multipath is a bit more complicated but also possible. I am willing to keep working on this patch to meet the remaining cases including IPv6. Thanks, Leone