Message ID | 20240729104741.370327-1-leitao@debian.org (mailing list archive) |
---|---|
State | Accepted |
Commit | c9c0ee5f20c593faf289fa8850c3ed84031dd12a |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: skbuff: Skip early return in skb_unref when debugging | expand |
On 7/29/24 12:47, Breno Leitao wrote: > This patch modifies the skb_unref function to skip the early return > optimization when CONFIG_DEBUG_NET is enabled. The change ensures that > the reference count decrement always occurs in debug builds, allowing > for more thorough checking of SKB reference counting. > > Previously, when the SKB's reference count was 1 and CONFIG_DEBUG_NET > was not set, the function would return early after a memory barrier > (smp_rmb()) without decrementing the reference count. This optimization > assumes it's safe to proceed with freeing the SKB without the overhead > of an atomic decrement from 1 to 0. > > With this change: > - In non-debug builds (CONFIG_DEBUG_NET not set), behavior remains > unchanged, preserving the performance optimization. > - In debug builds (CONFIG_DEBUG_NET set), the reference count is always > decremented, even when it's 1, allowing for consistent behavior and > potentially catching subtle SKB management bugs. > > This modification enhances debugging capabilities for networking code > without impacting performance in production kernels. It helps kernel > developers identify and diagnose issues related to SKB management and > reference counting in the network stack. > > Cc: Chris Mason <clm@fb.com> > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > include/linux/skbuff.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 29c3ea5b6e93..cf8f6ce06742 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1225,7 +1225,7 @@ static inline bool skb_unref(struct sk_buff *skb) > { > if (unlikely(!skb)) > return false; > - if (likely(refcount_read(&skb->users) == 1)) > + if (!IS_ENABLED(CONFIG_DEBUG_NET) && likely(refcount_read(&skb->users) == 1)) > smp_rmb(); > else if (likely(!refcount_dec_and_test(&skb->users))) > return false; I think one assumption behind CONFIG_DEBUG_NET is that enabling such config should not have any measurable impact on performances. I suspect the above could indeed cause some measurable impact, e.g. under UDP flood, when the user-space receiver and the BH runs on different cores, as this will increase pressure on the CPU cache. Could you please benchmark such scenario before and after this patch? Thanks! Paolo
Hello Paolo, On Tue, Jul 30, 2024 at 11:38:38AM +0200, Paolo Abeni wrote: > On 7/29/24 12:47, Breno Leitao wrote: > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -1225,7 +1225,7 @@ static inline bool skb_unref(struct sk_buff *skb) > > { > > if (unlikely(!skb)) > > return false; > > - if (likely(refcount_read(&skb->users) == 1)) > > + if (!IS_ENABLED(CONFIG_DEBUG_NET) && likely(refcount_read(&skb->users) == 1)) > > smp_rmb(); > > else if (likely(!refcount_dec_and_test(&skb->users))) > > return false; > I think one assumption behind CONFIG_DEBUG_NET is that enabling such config > should not have any measurable impact on performances. > > I suspect the above could indeed cause some measurable impact, e.g. under > UDP flood, when the user-space receiver and the BH runs on different cores, > as this will increase pressure on the CPU cache. Could you please benchmark > such scenario before and after this patch? Sure, I am more than happy to do so. I will be back with it soon. Assuming there is some performance overhead, isn't it a worthwhile trade-off for those who are debugging the network? In other words, wouldn't it be better to prioritize correctness over optimization in this the CONFIG_DEBUG_NET case, even if it means sacrificing some performance? Thanks for reviewing, --breno
Paolo Abeni <pabeni@redhat.com> wrote: > > else if (likely(!refcount_dec_and_test(&skb->users))) > > return false; > > I think one assumption behind CONFIG_DEBUG_NET is that enabling such config > should not have any measurable impact on performances. If thats the case why does it exist at all? I was under impression that entire reason for CONFIG_DEBUG_NET was to enable more checks for fuzzers and the like, i.e. NOT for production kernels.
On 7/30/24 12:50, Florian Westphal wrote: > Paolo Abeni <pabeni@redhat.com> wrote: >>> else if (likely(!refcount_dec_and_test(&skb->users))) >>> return false; >> >> I think one assumption behind CONFIG_DEBUG_NET is that enabling such config >> should not have any measurable impact on performances. > > If thats the case why does it exist at all? > > I was under impression that entire reason for CONFIG_DEBUG_NET was > to enable more checks for fuzzers and the like, i.e. NOT for production > kernels. I feel like I already had this discussion and I forgot the outcome, if so I'm sorry. To me the "but is safe to select." part in the knob description means this could be enabled in production, and AFAICS the CONFIG_DEBUG_NET-enabled code so far respects that assumption. Thanks, Paolo
On Tue, 30 Jul 2024 13:15:57 +0200 Paolo Abeni wrote: > > If thats the case why does it exist at all? +1 > > I was under impression that entire reason for CONFIG_DEBUG_NET was > > to enable more checks for fuzzers and the like, i.e. NOT for production > > kernels. > > I feel like I already had this discussion and I forgot the outcome, if > so I'm sorry. To me the "but is safe to select." part in the knob > description means this could be enabled in production, and AFAICS the > CONFIG_DEBUG_NET-enabled code so far respects that assumption. I believe the previous discussion was page pool specific and there wasn't as much of a conclusion as an acquiescence (read: we had more important things on our minds than that argument ;)). Should we set a bar for how much perf impact is okay? FTR I suspect there will be no measurable perf impact here.
On Tue, Jul 30, 2024 at 4:10 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 30 Jul 2024 13:15:57 +0200 Paolo Abeni wrote: > > > If thats the case why does it exist at all? > > +1 > > > > I was under impression that entire reason for CONFIG_DEBUG_NET was > > > to enable more checks for fuzzers and the like, i.e. NOT for production > > > kernels. > > > > I feel like I already had this discussion and I forgot the outcome, if > > so I'm sorry. To me the "but is safe to select." part in the knob > > description means this could be enabled in production, and AFAICS the > > CONFIG_DEBUG_NET-enabled code so far respects that assumption. > > I believe the previous discussion was page pool specific and there > wasn't as much of a conclusion as an acquiescence (read: we had more > important things on our minds than that argument ;)). > > Should we set a bar for how much perf impact is okay? > > FTR I suspect there will be no measurable perf impact here. Same feeling here, I think this patch is fine really. Reviewed-by: Eric Dumazet <edumazet@google.com>
On 7/30/24 16:10, Jakub Kicinski wrote: > On Tue, 30 Jul 2024 13:15:57 +0200 Paolo Abeni wrote: >>> I was under impression that entire reason for CONFIG_DEBUG_NET was >>> to enable more checks for fuzzers and the like, i.e. NOT for production >>> kernels. >> >> I feel like I already had this discussion and I forgot the outcome, if >> so I'm sorry. To me the "but is safe to select." part in the knob >> description means this could be enabled in production, and AFAICS the >> CONFIG_DEBUG_NET-enabled code so far respects that assumption. > > I believe the previous discussion was page pool specific and there > wasn't as much of a conclusion as an acquiescence (read: we had more > important things on our minds than that argument ;)). > > Should we set a bar for how much perf impact is okay? I think that better specifying the general guidance/expectation should be enough. What about extending the knob description with something alike: --- diff --git a/net/Kconfig.debug b/net/Kconfig.debug index 5e3fffe707dd..058cf031913b 100644 --- a/net/Kconfig.debug +++ b/net/Kconfig.debug @@ -24,3 +24,5 @@ config DEBUG_NET help Enable extra sanity checks in networking. This is mostly used by fuzzers, but is safe to select. + This could introduce some very minimal overhead and + is not suggested for production systems.
On Tue, 30 Jul 2024 16:37:10 +0200 Paolo Abeni wrote: > I think that better specifying the general guidance/expectation should > be enough. What about extending the knob description with something alike: > --- > diff --git a/net/Kconfig.debug b/net/Kconfig.debug > index 5e3fffe707dd..058cf031913b 100644 > --- a/net/Kconfig.debug > +++ b/net/Kconfig.debug > @@ -24,3 +24,5 @@ config DEBUG_NET > help > Enable extra sanity checks in networking. > This is mostly used by fuzzers, but is safe to select. > + This could introduce some very minimal overhead and > + is not suggested for production systems. I'd go with: Enable extra sanity checks in networking. - This is mostly used by fuzzers, but is safe to select. + This is mostly used by fuzzers, and may introduce some + minimal overhead, but is safe to select. What's acceptable on prod systems really depends on the workload..
Hello Paolo,
On Tue, Jul 30, 2024 at 11:38:38AM +0200, Paolo Abeni wrote:
> Could you please benchmark such scenario before and after this patch?
I've tested it on a 18-core Xeon D-2191A host, and I haven't found any
different in either TX/RX in TCP or UDP. At the same time, I must admit
that I have very low confidence in my tests.
I run the following tests for 10x on the same machine, just changing my
patch, and I getting the simple average of these 10 iterations. This is
what I am doing for TCP and UDP:
TCP:
# iperf -s &
# iperf -u -c localhost
Output: 16.5 Gbits/sec
UDP:
# iperf -s -u &
# iperf -u -c localhost
Output: 1.05 Mbits/sec
I don't know how to explain why UDP numbers are so low. I am happy to
run different tests, if you have any other recommendation.
--breno
On Wed, Jul 31, 2024 at 7:25 PM Breno Leitao <leitao@debian.org> wrote: > > Hello Paolo, > > On Tue, Jul 30, 2024 at 11:38:38AM +0200, Paolo Abeni wrote: > > Could you please benchmark such scenario before and after this patch? > > I've tested it on a 18-core Xeon D-2191A host, and I haven't found any > different in either TX/RX in TCP or UDP. At the same time, I must admit > that I have very low confidence in my tests. > > I run the following tests for 10x on the same machine, just changing my > patch, and I getting the simple average of these 10 iterations. This is > what I am doing for TCP and UDP: > > TCP: > # iperf -s & > # iperf -u -c localhost > > Output: 16.5 Gbits/sec > > UDP: > # iperf -s -u & > # iperf -u -c localhost > > Output: 1.05 Mbits/sec > > I don't know how to explain why UDP numbers are so low. I am happy to > run different tests, if you have any other recommendation. I think the iperf tool uses '-b 1' as default, which is explained in the man page: CLIENT SPECIFIC OPTIONS -b, --bandwidth n[kmgKMG][,n[kmgKMG]] | n[kmgKMG]pps set target bandwidth to n bits/sec (default 1 Mbit/sec) or n packets per sec. This may be used with TCP or UDP. Optionally, for variable loads, use format of mean,standard deviation So, if you try the parameter like '-b 40MB', it will reach around 40MB/sec speed. If I were you, I could try the following way: 1) iperf3 -s 2) iperf3 -u -c 127.0.0.1 -b 0 -l 64 Hope it can help you:) Thanks, Jason > > --breno >
On 7/31/24 13:24, Breno Leitao wrote: > Hello Paolo, > > On Tue, Jul 30, 2024 at 11:38:38AM +0200, Paolo Abeni wrote: >> Could you please benchmark such scenario before and after this patch? > > I've tested it on a 18-core Xeon D-2191A host, and I haven't found any > different in either TX/RX in TCP or UDP. At the same time, I must admit > that I have very low confidence in my tests. > > I run the following tests for 10x on the same machine, just changing my > patch, and I getting the simple average of these 10 iterations. This is > what I am doing for TCP and UDP: > > TCP: > # iperf -s & > # iperf -u -c localhost > > Output: 16.5 Gbits/sec > > UDP: > # iperf -s -u & > # iperf -u -c localhost > > Output: 1.05 Mbits/sec > > I don't know how to explain why UDP numbers are so low. I am happy to > run different tests, if you have any other recommendation. Beyond the '-b 0' argument, as noted by Jason, you need to do manual CPU pinning of both the sender and the receiver. Additionally, to really flood the receiver you likely have to place the sender on a different host. In any case, given all the prior discussion, I don't intend to block this patch. Cheers, Paolo
Hello: This patch was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni@redhat.com>: On Mon, 29 Jul 2024 03:47:40 -0700 you wrote: > This patch modifies the skb_unref function to skip the early return > optimization when CONFIG_DEBUG_NET is enabled. The change ensures that > the reference count decrement always occurs in debug builds, allowing > for more thorough checking of SKB reference counting. > > Previously, when the SKB's reference count was 1 and CONFIG_DEBUG_NET > was not set, the function would return early after a memory barrier > (smp_rmb()) without decrementing the reference count. This optimization > assumes it's safe to proceed with freeing the SKB without the overhead > of an atomic decrement from 1 to 0. > > [...] Here is the summary with links: - [net-next] net: skbuff: Skip early return in skb_unref when debugging https://git.kernel.org/netdev/net-next/c/c9c0ee5f20c5 You are awesome, thank you!
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 29c3ea5b6e93..cf8f6ce06742 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1225,7 +1225,7 @@ static inline bool skb_unref(struct sk_buff *skb) { if (unlikely(!skb)) return false; - if (likely(refcount_read(&skb->users) == 1)) + if (!IS_ENABLED(CONFIG_DEBUG_NET) && likely(refcount_read(&skb->users) == 1)) smp_rmb(); else if (likely(!refcount_dec_and_test(&skb->users))) return false;
This patch modifies the skb_unref function to skip the early return optimization when CONFIG_DEBUG_NET is enabled. The change ensures that the reference count decrement always occurs in debug builds, allowing for more thorough checking of SKB reference counting. Previously, when the SKB's reference count was 1 and CONFIG_DEBUG_NET was not set, the function would return early after a memory barrier (smp_rmb()) without decrementing the reference count. This optimization assumes it's safe to proceed with freeing the SKB without the overhead of an atomic decrement from 1 to 0. With this change: - In non-debug builds (CONFIG_DEBUG_NET not set), behavior remains unchanged, preserving the performance optimization. - In debug builds (CONFIG_DEBUG_NET set), the reference count is always decremented, even when it's 1, allowing for consistent behavior and potentially catching subtle SKB management bugs. This modification enhances debugging capabilities for networking code without impacting performance in production kernels. It helps kernel developers identify and diagnose issues related to SKB management and reference counting in the network stack. Cc: Chris Mason <clm@fb.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> --- include/linux/skbuff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)