Message ID | a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 7cb31c46b9cc37a5e564667fe46daf9a35dbafdd |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | optimise local CPU skb_attempt_defer_free | expand |
On Wed, Apr 10, 2024 at 3:28 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > Optimise skb_attempt_defer_free() when run by the same CPU the skb was > allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can > disable softirqs and put the buffer into cpu local caches. > > CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1% > throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles, > the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note, > I'd expect the win doubled with rx only benchmarks, as the optimisation > is for the receive path, but the test spends >55% of CPU doing writes. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com>
On Wed, Apr 10, 2024 at 9:28 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > Optimise skb_attempt_defer_free() when run by the same CPU the skb was > allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can > disable softirqs and put the buffer into cpu local caches. > > CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1% > throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles, > the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note, > I'd expect the win doubled with rx only benchmarks, as the optimisation > is for the receive path, but the test spends >55% of CPU doing writes. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 21cd01641f4c..62b07ed3af98 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -6974,6 +6974,19 @@ void __skb_ext_put(struct skb_ext *ext) EXPORT_SYMBOL(__skb_ext_put); #endif /* CONFIG_SKB_EXTENSIONS */ +static void kfree_skb_napi_cache(struct sk_buff *skb) +{ + /* if SKB is a clone, don't handle this case */ + if (skb->fclone != SKB_FCLONE_UNAVAILABLE) { + __kfree_skb(skb); + return; + } + + local_bh_disable(); + __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED); + local_bh_enable(); +} + /** * skb_attempt_defer_free - queue skb for remote freeing * @skb: buffer @@ -6992,7 +7005,7 @@ void skb_attempt_defer_free(struct sk_buff *skb) if (WARN_ON_ONCE(cpu >= nr_cpu_ids) || !cpu_online(cpu) || cpu == raw_smp_processor_id()) { -nodefer: __kfree_skb(skb); +nodefer: kfree_skb_napi_cache(skb); return; }
Optimise skb_attempt_defer_free() when run by the same CPU the skb was allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can disable softirqs and put the buffer into cpu local caches. CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1% throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles, the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the optimisation is for the receive path, but the test spends >55% of CPU doing writes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- net/core/skbuff.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)