diff mbox series

[net-next,9/9] net: skbuff: always try to recycle PP pages directly when in softirq

Message ID 20230727144336.1646454-10-aleksander.lobakin@intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series page_pool: a couple of assorted optimizations | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1344 this patch: 1344
netdev/cc_maintainers warning 4 maintainers not CCed: daniel@iogearbox.net ast@kernel.org bpf@vger.kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1367 this patch: 1367
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Alexander Lobakin July 27, 2023, 2:43 p.m. UTC
Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") allowed direct recycling of skb pages to their PP for some cases,
but unfortunately missed a couple of other majors.
For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
which unconditionally passes `false` as @napi_safe. Thus, all pages go
through ptr_ring and locks, although most of time we're actually inside
the NAPI polling this PP is linked with, so that it would be perfectly
safe to recycle pages directly.
Let's address such. If @napi_safe is true, we're fine, don't change
anything for this path. But if it's false, check whether we are in the
softirq context. It will most likely be so and then if ->list_owner
is our current CPU, we're good to use direct recycling, even though
@napi_safe is false -- concurrent access is excluded. in_softirq()
protection is needed mostly due to we can hit this place in the
process context (not the hardirq though).
For the mentioned xdp-drop-skb-mode case, the improvement I got is
3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
alloc_slow counter doesn't change most of time, which means the
MM layer is not even called to allocate any new pages.

Suggested-by: Jakub Kicinski <kuba@kernel.org> # in_softirq()
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 net/core/skbuff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jesper Dangaard Brouer July 28, 2023, 9:32 a.m. UTC | #1
On 27/07/2023 16.43, Alexander Lobakin wrote:
> Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
> NAPI") allowed direct recycling of skb pages to their PP for some cases,
> but unfortunately missed a couple of other majors.
> For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
> which unconditionally passes `false` as @napi_safe. Thus, all pages go
> through ptr_ring and locks, although most of time we're actually inside
> the NAPI polling this PP is linked with, so that it would be perfectly
> safe to recycle pages directly.

The commit messages is hard to read. It would help me as the reader if
you used a empty line between paragraphs, like in this location (same
goes for other commit descs).

> Let's address such. If @napi_safe is true, we're fine, don't change
> anything for this path. But if it's false, check whether we are in the
> softirq context. It will most likely be so and then if ->list_owner
> is our current CPU, we're good to use direct recycling, even though
> @napi_safe is false -- concurrent access is excluded. in_softirq()
> protection is needed mostly due to we can hit this place in the
> process context (not the hardirq though).

This patch make me a little nervous, as it can create hard-to-debug bugs
if this isn't 100% correct.  (Thanks for previous patch that exclude
hardirq via lockdep).

> For the mentioned xdp-drop-skb-mode case, the improvement I got is
> 3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
> alloc_slow counter doesn't change most of time, which means the
> MM layer is not even called to allocate any new pages.
> 
> Suggested-by: Jakub Kicinski <kuba@kernel.org> # in_softirq()
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>   net/core/skbuff.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e701401092d7..5ba3948cceed 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -901,8 +901,10 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe)
>   	/* Allow direct recycle if we have reasons to believe that we are
>   	 * in the same context as the consumer would run, so there's
>   	 * no possible race.
> +	 * __page_pool_put_page() makes sure we're not in hardirq context
> +	 * and interrupts are enabled prior to accessing the cache.
>   	 */
> -	if (napi_safe) {
> +	if (napi_safe || in_softirq()) {

I used to have in_serving_softirq() in PP to exclude process context
that just disabled BH to do direct recycling (into a lockless array).
This changed in kernel v6.3 commit 542bcea4be86 ("net: page_pool: use
in_softirq() instead") to help threaded NAPI.  I guess, nothing blew up
so I guess this was okay to relax this.

>   		const struct napi_struct *napi = READ_ONCE(pp->p.napi);
>   
>   		allow_direct = napi &&

AFAIK this in_softirq() will allow process context with disabled BH to
also recycle directly into the PP lockless array.  With the additional
checks (that are just outside above diff-context) that I assume makes
sure CPU (smp_processor_id()) also match.  Is this safe?

--Jesper
Alexander Lobakin July 28, 2023, 1:50 p.m. UTC | #2
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
Date: Fri, 28 Jul 2023 11:32:01 +0200

> 
> 
> On 27/07/2023 16.43, Alexander Lobakin wrote:
>> Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
>> NAPI") allowed direct recycling of skb pages to their PP for some cases,
>> but unfortunately missed a couple of other majors.
>> For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
>> which unconditionally passes `false` as @napi_safe. Thus, all pages go
>> through ptr_ring and locks, although most of time we're actually inside
>> the NAPI polling this PP is linked with, so that it would be perfectly
>> safe to recycle pages directly.
> 
> The commit messages is hard to read. It would help me as the reader if
> you used a empty line between paragraphs, like in this location (same
> goes for other commit descs).

O_o
I paste empty line basing on logics. These two don't have it, as the
second paragraph is the continuation of the first: it expands what I
mean by "a couple of other majors".
Do you want to have empty newlines between each paragraph instead?

> 
>> Let's address such. If @napi_safe is true, we're fine, don't change
>> anything for this path. But if it's false, check whether we are in the
>> softirq context. It will most likely be so and then if ->list_owner
>> is our current CPU, we're good to use direct recycling, even though
>> @napi_safe is false -- concurrent access is excluded. in_softirq()
>> protection is needed mostly due to we can hit this place in the
>> process context (not the hardirq though).
> 
> This patch make me a little nervous, as it can create hard-to-debug bugs
> if this isn't 100% correct.  (Thanks for previous patch that exclude
> hardirq via lockdep).

Pretty much any -next patch can create "hard-to-debug" bugs. Not a
reason to avoid any improvements, tho?
Speaking of this particular patch, can you give an example of situation
where this wouldn't be correct?

> 
>> For the mentioned xdp-drop-skb-mode case, the improvement I got is
>> 3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
>> alloc_slow counter doesn't change most of time, which means the
>> MM layer is not even called to allocate any new pages.
>>
>> Suggested-by: Jakub Kicinski <kuba@kernel.org> # in_softirq()
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> ---
>>   net/core/skbuff.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index e701401092d7..5ba3948cceed 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -901,8 +901,10 @@ bool page_pool_return_skb_page(struct page *page,
>> bool napi_safe)
>>       /* Allow direct recycle if we have reasons to believe that we are
>>        * in the same context as the consumer would run, so there's
>>        * no possible race.
>> +     * __page_pool_put_page() makes sure we're not in hardirq context
>> +     * and interrupts are enabled prior to accessing the cache.
>>        */
>> -    if (napi_safe) {
>> +    if (napi_safe || in_softirq()) {
> 
> I used to have in_serving_softirq() in PP to exclude process context
> that just disabled BH to do direct recycling (into a lockless array).
> This changed in kernel v6.3 commit 542bcea4be86 ("net: page_pool: use
> in_softirq() instead") to help threaded NAPI.  I guess, nothing blew up
> so I guess this was okay to relax this.

(below)

> 
>>           const struct napi_struct *napi = READ_ONCE(pp->p.napi);
>>             allow_direct = napi &&
> 
> AFAIK this in_softirq() will allow process context with disabled BH to
> also recycle directly into the PP lockless array.  With the additional
> checks (that are just outside above diff-context) that I assume makes
> sure CPU (smp_processor_id()) also match.  Is this safe?

Disabling BH also disables preemption. smp_processor_id() can give wrong
values only when preemption is enabled (see get_cpu()/put_cpu()).
Also look at how threaded NAPI and busy polling call NAPI polling
callbacks. They just disable BH. And nobody ever said that it's not safe
to call smp_processor_id() in the NAPI polling callbacks.

When your context matches and the processor ID matches, how could you
provoke concurrent access?

> 
> --Jesper
> 

Thanks,
Olek
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e701401092d7..5ba3948cceed 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -901,8 +901,10 @@  bool page_pool_return_skb_page(struct page *page, bool napi_safe)
 	/* Allow direct recycle if we have reasons to believe that we are
 	 * in the same context as the consumer would run, so there's
 	 * no possible race.
+	 * __page_pool_put_page() makes sure we're not in hardirq context
+	 * and interrupts are enabled prior to accessing the cache.
 	 */
-	if (napi_safe) {
+	if (napi_safe || in_softirq()) {
 		const struct napi_struct *napi = READ_ONCE(pp->p.napi);
 
 		allow_direct = napi &&