diff mbox series

skbuff: Fix a race between coalescing and releasing SKBs

Message ID 20230404074733.22869-1-liangchen.linux@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series skbuff: Fix a race between coalescing and releasing SKBs | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch warning CHECK: Logical continuations should be on the previous line CHECK: Unnecessary parentheses around 'to->pp_recycle != from->pp_recycle' WARNING: 'prohibitted' may be misspelled - perhaps 'prohibited'?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Liang Chen April 4, 2023, 7:47 a.m. UTC
Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page
and page pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibitted.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
 <IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 net/core/skbuff.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comments

Yunsheng Lin April 4, 2023, 11:18 a.m. UTC | #1
On 2023/4/4 15:47, Liang Chen wrote:
> Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page
> and page pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> However, it actually requires skb_cloned(@from) to hold true until
> coalescing finishes in this situation. If the other cloned SKB is
> released while the merging is in process, from_shinfo->nr_frags will be
> set to 0 toward the end of the function, causing the increment of frag
> page _refcount to be unexpectedly skipped resulting in inconsistent
> reference counts. Later when SKB(@to) is released, it frees the page
> directly even though the page pool page is still in use, leading to
> use-after-free or double-free errors. So it should be prohibitted.

Nice catch.

> 
> The double-free error message below prompted us to investigate:
> BUG: Bad page state in process swapper/1  pfn:0e0d1
> page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0xe0d1
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
> raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
> Call Trace:
>  <IRQ>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_pcp_prepare+0x260/0x2f0
> free_unref_page+0x20/0x1c0
> skb_release_data+0x10b/0x1a0
> napi_consume_skb+0x56/0x150
> net_rx_action+0xf0/0x350
> ? __napi_schedule+0x79/0x90
> __do_softirq+0xc8/0x2b1
> __irq_exit_rcu+0xb9/0xf0
> common_interrupt+0x82/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x22/0x40
> RIP: 0010:default_idle+0xb/0x20

A Fixes tag is needed for the backport of the fix.
Fixes: 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling")

And target the net branch for bugfix:
 [PATCH net] skbuff: Fix a race between coalescing and releasing SKBs

Otherwise, it looks good to me:
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>

> 
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> ---
>  net/core/skbuff.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 050a875d09c5..9be23ece5f03 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  		return false;
>  
>  	/* In general, avoid mixing slab allocated and page_pool allocated
> -	 * pages within the same SKB. However when @to is not pp_recycle and
> -	 * @from is cloned, we can transition frag pages from page_pool to
> -	 * reference counted.
> -	 *
> -	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
> -	 * @from is cloned, in case the SKB is using page_pool fragment
> -	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
> -	 * references for cloned SKBs at the moment that would result in
> -	 * inconsistent reference counts.
> +	 * pages within the same SKB. However don't allow coalescing two
> +	 * pp_recycle SKBs if @from is cloned, in case the SKB is using
> +	 * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only
> +	 * take full page references for cloned SKBs at the moment that would
> +	 * result in inconsistent reference counts.
>  	 */
> -	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> +	if ((to->pp_recycle != from->pp_recycle)
> +		|| (from->pp_recycle && skb_cloned(from)))
>  		return false;
>  
>  	if (len <= skb_tailroom(to)) {
>
Alexander Duyck April 4, 2023, 3:51 p.m. UTC | #2
On Tue, 2023-04-04 at 15:47 +0800, Liang Chen wrote:
> Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page
> and page pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> However, it actually requires skb_cloned(@from) to hold true until
> coalescing finishes in this situation. If the other cloned SKB is
> released while the merging is in process, from_shinfo->nr_frags will be
> set to 0 toward the end of the function, causing the increment of frag
> page _refcount to be unexpectedly skipped resulting in inconsistent
> reference counts. Later when SKB(@to) is released, it frees the page
> directly even though the page pool page is still in use, leading to
> use-after-free or double-free errors. So it should be prohibitted.
> 
> The double-free error message below prompted us to investigate:
> BUG: Bad page state in process swapper/1  pfn:0e0d1
> page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0xe0d1
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
> raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
> Call Trace:
>  <IRQ>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_pcp_prepare+0x260/0x2f0
> free_unref_page+0x20/0x1c0
> skb_release_data+0x10b/0x1a0
> napi_consume_skb+0x56/0x150
> net_rx_action+0xf0/0x350
> ? __napi_schedule+0x79/0x90
> __do_softirq+0xc8/0x2b1
> __irq_exit_rcu+0xb9/0xf0
> common_interrupt+0x82/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x22/0x40
> RIP: 0010:default_idle+0xb/0x20
> 
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> ---
>  net/core/skbuff.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 050a875d09c5..9be23ece5f03 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  		return false;
>  
>  	/* In general, avoid mixing slab allocated and page_pool allocated
> -	 * pages within the same SKB. However when @to is not pp_recycle and
> -	 * @from is cloned, we can transition frag pages from page_pool to
> -	 * reference counted.
> -	 *
> -	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
> -	 * @from is cloned, in case the SKB is using page_pool fragment
> -	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
> -	 * references for cloned SKBs at the moment that would result in
> -	 * inconsistent reference counts.
> +	 * pages within the same SKB. However don't allow coalescing two
> +	 * pp_recycle SKBs if @from is cloned, in case the SKB is using
> +	 * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only
> +	 * take full page references for cloned SKBs at the moment that would
> +	 * result in inconsistent reference counts.
>  	 */
> -	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> +	if ((to->pp_recycle != from->pp_recycle)
> +		|| (from->pp_recycle && skb_cloned(from)))
>  		return false;
>  
>  	if (len <= skb_tailroom(to)) {

I'm not quite sure I agree with the fix. Couldn't we just modify the
check further down that does:

        if (!skb_cloned(from))
                from_shinfo->nr_frags = 0;

And instead just make that:
	if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
                from_shinfo->nr_frags = 0;

With that we would retain the existing behavior and in the case of
cloned from frames we would take the references and let the original
from skb freed to take care of pulling the pages from the page pool.
Jakub Kicinski April 5, 2023, 1:21 a.m. UTC | #3
On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> I'm not quite sure I agree with the fix. Couldn't we just modify the
> check further down that does:
> 
>         if (!skb_cloned(from))
>                 from_shinfo->nr_frags = 0;
> 
> And instead just make that:
> 	if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
>                 from_shinfo->nr_frags = 0;
> 
> With that we would retain the existing behavior and in the case of
> cloned from frames we would take the references and let the original
> from skb freed to take care of pulling the pages from the page pool.

Sounds like a better fix, indeed. But this sort of code will require
another fat comment above to explain why. This:

	if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))

is much easier to understand, no?

We should at least include that in the explanatory comment, I reckon...
Liang Chen April 5, 2023, 8:18 a.m. UTC | #4
On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > check further down that does:
> >
> >         if (!skb_cloned(from))
> >                 from_shinfo->nr_frags = 0;
> >
> > And instead just make that:
> >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> >                 from_shinfo->nr_frags = 0;
> >
> > With that we would retain the existing behavior and in the case of
> > cloned from frames we would take the references and let the original
> > from skb freed to take care of pulling the pages from the page pool.
>
> Sounds like a better fix, indeed. But this sort of code will require
> another fat comment above to explain why. This:
>
>         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
>
> is much easier to understand, no?
>
> We should at least include that in the explanatory comment, I reckon...

Sure, the idea of dealing with the case where @from transitioned into non cloned
skb in the function retains the existing behavior, and gives more
opportunities to
coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
here.

I will take a closer look at the code path for the fragstolen case
before making v2
patch  -  If @from transitioned into non cloned skb before "if
(skb_head_is_locked(from))"

Thanks for the reviews.
Jakub Kicinski April 5, 2023, 2:50 p.m. UTC | #5
On Wed, 5 Apr 2023 16:18:47 +0800 Liang Chen wrote:
> > Sounds like a better fix, indeed. But this sort of code will require
> > another fat comment above to explain why. This:
> >
> >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> >
> > is much easier to understand, no?
> >
> > We should at least include that in the explanatory comment, I reckon...  
> 
> Sure, the idea of dealing with the case where @from transitioned into non cloned
> skb in the function retains the existing behavior, and gives more
> opportunities to
> coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> here.

Well, that's pretty much what Alex suggested minus the optimization he
put in for "was never cloned" which is probably worth having. So if
you're gonna do this just use his code.

My point was that !from->pp_recycle requires the reader to understand
the relationship between this check and the previous condition at entry.
While to->pp_recycle == from->pp_recycle seems much more obvious to me -
directly shifting frags between skbs with different refcount styles is
dangerous.

Maybe it's just me, so whatever.
Make sure you write a good comment.

> I will take a closer look at the code path for the fragstolen case
> before making v2
> patch  -  If @from transitioned into non cloned skb before "if
> (skb_head_is_locked(from))"
> 
> Thanks for the reviews.
Alexander Duyck April 5, 2023, 3:06 p.m. UTC | #6
On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote:
>
> On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > > check further down that does:
> > >
> > >         if (!skb_cloned(from))
> > >                 from_shinfo->nr_frags = 0;
> > >
> > > And instead just make that:
> > >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> > >                 from_shinfo->nr_frags = 0;
> > >
> > > With that we would retain the existing behavior and in the case of
> > > cloned from frames we would take the references and let the original
> > > from skb freed to take care of pulling the pages from the page pool.
> >
> > Sounds like a better fix, indeed. But this sort of code will require
> > another fat comment above to explain why. This:
> >
> >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> >
> > is much easier to understand, no?
> >
> > We should at least include that in the explanatory comment, I reckon...
>
> Sure, the idea of dealing with the case where @from transitioned into non cloned
> skb in the function retains the existing behavior, and gives more
> opportunities to
> coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> here.
> I will take a closer look at the code path for the fragstolen case
> before making v2
> patch  -  If @from transitioned into non cloned skb before "if
> (skb_head_is_locked(from))"
>
> Thanks for the reviews.

Actually I am not sure that works now that I look at it closer. The
problem with using (!skb_cloned(from) && !from->pp_recycle) is that it
breaks the case where both from and to are pp_recycle without being
cloned.

So it probably needs to be something actually the setup Jakub
suggested would probably work better:
  if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))

Basically we just have to guarantee that if we are copying frags over
the frag destructor is the same for the origin and destination.
Otherwise we can take a reference and convert them to being reference
counted.
Liang Chen April 6, 2023, 3:22 a.m. UTC | #7
On Wed, Apr 5, 2023 at 10:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 5 Apr 2023 16:18:47 +0800 Liang Chen wrote:
> > > Sounds like a better fix, indeed. But this sort of code will require
> > > another fat comment above to explain why. This:
> > >
> > >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > >
> > > is much easier to understand, no?
> > >
> > > We should at least include that in the explanatory comment, I reckon...
> >
> > Sure, the idea of dealing with the case where @from transitioned into non cloned
> > skb in the function retains the existing behavior, and gives more
> > opportunities to
> > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> > here.
>
> Well, that's pretty much what Alex suggested minus the optimization he
> put in for "was never cloned" which is probably worth having. So if
> you're gonna do this just use his code.
>
> My point was that !from->pp_recycle requires the reader to understand
> the relationship between this check and the previous condition at entry.
> While to->pp_recycle == from->pp_recycle seems much more obvious to me -
> directly shifting frags between skbs with different refcount styles is
> dangerous.
>

Yeah, I agree with the point that to->pp_recycle == from->pp_recycle
is easier to understand, and will use it in the next iteration of the
patch. Thanks!

> Maybe it's just me, so whatever.
> Make sure you write a good comment.

Sure.
>
> > I will take a closer look at the code path for the fragstolen case
> > before making v2
> > patch  -  If @from transitioned into non cloned skb before "if
> > (skb_head_is_locked(from))"
> >

I took a closer look at the code path for the "fragstolen" case, and
it indeed requires a special handle for the situation addressed in
this patch. Something like,

    if( to->pp_recycle != from->pp_recycle )
        get_page(page);

before  "*fragstolen = true;".   But this makes the logic a bit
complicated. Anyway, I will include the logic in the v2 patch. Let's
see if it looks better.

> > Thanks for the reviews.
Liang Chen April 6, 2023, 3:28 a.m. UTC | #8
On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> >
> > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > > > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > > > check further down that does:
> > > >
> > > >         if (!skb_cloned(from))
> > > >                 from_shinfo->nr_frags = 0;
> > > >
> > > > And instead just make that:
> > > >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> > > >                 from_shinfo->nr_frags = 0;
> > > >
> > > > With that we would retain the existing behavior and in the case of
> > > > cloned from frames we would take the references and let the original
> > > > from skb freed to take care of pulling the pages from the page pool.
> > >
> > > Sounds like a better fix, indeed. But this sort of code will require
> > > another fat comment above to explain why. This:
> > >
> > >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > >
> > > is much easier to understand, no?
> > >
> > > We should at least include that in the explanatory comment, I reckon...
> >
> > Sure, the idea of dealing with the case where @from transitioned into non cloned
> > skb in the function retains the existing behavior, and gives more
> > opportunities to
> > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> > here.
> > I will take a closer look at the code path for the fragstolen case
> > before making v2
> > patch  -  If @from transitioned into non cloned skb before "if
> > (skb_head_is_locked(from))"
> >
> > Thanks for the reviews.
>
> Actually I am not sure that works now that I look at it closer. The
> problem with using (!skb_cloned(from) && !from->pp_recycle) is that it
> breaks the case where both from and to are pp_recycle without being
> cloned.

Yeah, it would break that case. Thanks!
>
> So it probably needs to be something actually the setup Jakub
> suggested would probably work better:
>   if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
>

I agree. That's better.
> Basically we just have to guarantee that if we are copying frags over
> the frag destructor is the same for the origin and destination.
> Otherwise we can take a reference and convert them to being reference
> counted.
Eric Dumazet April 6, 2023, 9:56 a.m. UTC | #9
On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote:
>
> On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> > >
> > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > > > > check further down that does:
> > > > >
> > > > >         if (!skb_cloned(from))
> > > > >                 from_shinfo->nr_frags = 0;
> > > > >
> > > > > And instead just make that:
> > > > >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> > > > >                 from_shinfo->nr_frags = 0;
> > > > >
> > > > > With that we would retain the existing behavior and in the case of
> > > > > cloned from frames we would take the references and let the original
> > > > > from skb freed to take care of pulling the pages from the page pool.
> > > >
> > > > Sounds like a better fix, indeed. But this sort of code will require
> > > > another fat comment above to explain why. This:
> > > >
> > > >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > > >
> > > > is much easier to understand, no?
> > > >
> > > > We should at least include that in the explanatory comment, I reckon...
> > >
> > > Sure, the idea of dealing with the case where @from transitioned into non cloned
> > > skb in the function retains the existing behavior, and gives more
> > > opportunities to
> > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> > > here.
> > > I will take a closer look at the code path for the fragstolen case
> > > before making v2
> > > patch  -  If @from transitioned into non cloned skb before "if
> > > (skb_head_is_locked(from))"
> > >
> > > Thanks for the reviews.
> >
> > Actually I am not sure that works now that I look at it closer. The
> > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it
> > breaks the case where both from and to are pp_recycle without being
> > cloned.
>
> Yeah, it would break that case. Thanks!
> >
> > So it probably needs to be something actually the setup Jakub
> > suggested would probably work better:
> >   if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> >
>
> I agree. That's better.

Same feeling on my side.
I prefer not trying to merge mixed pp_recycle skbs "just because we
could" at the expense
of adding more code in a fast path.
Ilias Apalodimas April 6, 2023, 10:46 a.m. UTC | #10
On Thu, 6 Apr 2023 at 12:56, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> >
> > On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > > > > > check further down that does:
> > > > > >
> > > > > >         if (!skb_cloned(from))
> > > > > >                 from_shinfo->nr_frags = 0;
> > > > > >
> > > > > > And instead just make that:
> > > > > >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> > > > > >                 from_shinfo->nr_frags = 0;
> > > > > >
> > > > > > With that we would retain the existing behavior and in the case of
> > > > > > cloned from frames we would take the references and let the original
> > > > > > from skb freed to take care of pulling the pages from the page pool.
> > > > >
> > > > > Sounds like a better fix, indeed. But this sort of code will require
> > > > > another fat comment above to explain why. This:
> > > > >
> > > > >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > > > >
> > > > > is much easier to understand, no?
> > > > >
> > > > > We should at least include that in the explanatory comment, I reckon...
> > > >
> > > > Sure, the idea of dealing with the case where @from transitioned into non cloned
> > > > skb in the function retains the existing behavior, and gives more
> > > > opportunities to
> > > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> > > > here.
> > > > I will take a closer look at the code path for the fragstolen case
> > > > before making v2
> > > > patch  -  If @from transitioned into non cloned skb before "if
> > > > (skb_head_is_locked(from))"
> > > >
> > > > Thanks for the reviews.
> > >
> > > Actually I am not sure that works now that I look at it closer. The
> > > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it
> > > breaks the case where both from and to are pp_recycle without being
> > > cloned.
> >
> > Yeah, it would break that case. Thanks!
> > >
> > > So it probably needs to be something actually the setup Jakub
> > > suggested would probably work better:
> > >   if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > >
> >
> > I agree. That's better.
>
> Same feeling on my side.
> I prefer not trying to merge mixed pp_recycle skbs "just because we
> could" at the expense
> of adding more code in a fast path.

+1 here.  The intention of recycling was to affect the normal path as
less as possible.  On top of that, we've some amount of race
conditions over the years, trying to squeeze more performance with
similar tricks.  I'd much rather be safe here, since recycling by
itself is a great performance boost

Regards
/Ilias
Liang Chen April 6, 2023, 11:54 a.m. UTC | #11
On Thu, Apr 6, 2023 at 6:46 PM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> On Thu, 6 Apr 2023 at 12:56, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> > >
> > > On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote:
> > > > >
> > > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote:
> > > > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the
> > > > > > > check further down that does:
> > > > > > >
> > > > > > >         if (!skb_cloned(from))
> > > > > > >                 from_shinfo->nr_frags = 0;
> > > > > > >
> > > > > > > And instead just make that:
> > > > > > >       if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
> > > > > > >                 from_shinfo->nr_frags = 0;
> > > > > > >
> > > > > > > With that we would retain the existing behavior and in the case of
> > > > > > > cloned from frames we would take the references and let the original
> > > > > > > from skb freed to take care of pulling the pages from the page pool.
> > > > > >
> > > > > > Sounds like a better fix, indeed. But this sort of code will require
> > > > > > another fat comment above to explain why. This:
> > > > > >
> > > > > >         if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > > > > >
> > > > > > is much easier to understand, no?
> > > > > >
> > > > > > We should at least include that in the explanatory comment, I reckon...
> > > > >
> > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned
> > > > > skb in the function retains the existing behavior, and gives more
> > > > > opportunities to
> > > > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough
> > > > > here.
> > > > > I will take a closer look at the code path for the fragstolen case
> > > > > before making v2
> > > > > patch  -  If @from transitioned into non cloned skb before "if
> > > > > (skb_head_is_locked(from))"
> > > > >
> > > > > Thanks for the reviews.
> > > >
> > > > Actually I am not sure that works now that I look at it closer. The
> > > > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it
> > > > breaks the case where both from and to are pp_recycle without being
> > > > cloned.
> > >
> > > Yeah, it would break that case. Thanks!
> > > >
> > > > So it probably needs to be something actually the setup Jakub
> > > > suggested would probably work better:
> > > >   if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
> > > >
> > >
> > > I agree. That's better.
> >
> > Same feeling on my side.
> > I prefer not trying to merge mixed pp_recycle skbs "just because we
> > could" at the expense
> > of adding more code in a fast path.
>
> +1 here.  The intention of recycling was to affect the normal path as
> less as possible.  On top of that, we've some amount of race
> conditions over the years, trying to squeeze more performance with
> similar tricks.  I'd much rather be safe here, since recycling by
> itself is a great performance boost
>
> Regards
> /Ilias

Sorry, I didn't check my inbox before sending out the v2 patch.

Yeah, It is a bit complicated as we expected. The patch is sent out.
Please take a look to see if it is the way to go, or We should stay
with the current patch for simplicity reasons. Thanks!
Jakub Kicinski April 6, 2023, 2:59 p.m. UTC | #12
On Thu, 6 Apr 2023 19:54:23 +0800 Liang Chen wrote:
> > > Same feeling on my side.
> > > I prefer not trying to merge mixed pp_recycle skbs "just because we
> > > could" at the expense
> > > of adding more code in a fast path.  
> >
> > +1 here.  The intention of recycling was to affect the normal path as
> > less as possible.  On top of that, we've some amount of race
> > conditions over the years, trying to squeeze more performance with
> > similar tricks.  I'd much rather be safe here, since recycling by
> > itself is a great performance boost
> 
> Sorry, I didn't check my inbox before sending out the v2 patch.

I can discard v2 from patchwork, let's continue the conversation
here.

> Yeah, It is a bit complicated as we expected. The patch is sent out.
> Please take a look to see if it is the way to go, or We should stay
> with the current patch for simplicity reasons. Thanks!

Sounds like you know what Eric and Ilias agreed with, I'm a bit
confused.. are we basically going back to v1? (hopefully with coding
style fixed)
Liang Chen April 7, 2023, 12:45 a.m. UTC | #13
On Thu, Apr 6, 2023 at 10:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 6 Apr 2023 19:54:23 +0800 Liang Chen wrote:
> > > > Same feeling on my side.
> > > > I prefer not trying to merge mixed pp_recycle skbs "just because we
> > > > could" at the expense
> > > > of adding more code in a fast path.
> > >
> > > +1 here.  The intention of recycling was to affect the normal path as
> > > less as possible.  On top of that, we've some amount of race
> > > conditions over the years, trying to squeeze more performance with
> > > similar tricks.  I'd much rather be safe here, since recycling by
> > > itself is a great performance boost
> >
> > Sorry, I didn't check my inbox before sending out the v2 patch.
>
> I can discard v2 from patchwork, let's continue the conversation
> here.
>
> > Yeah, It is a bit complicated as we expected. The patch is sent out.
> > Please take a look to see if it is the way to go, or We should stay
> > with the current patch for simplicity reasons. Thanks!
>
> Sounds like you know what Eric and Ilias agreed with, I'm a bit
> confused.. are we basically going back to v1? (hopefully with coding
> style fixed)

Sorry for making the confusion. I just felt the v2 patch complicated
the code a bit.  I am not quite sure if the performance gain justifies
the added complexity, and would really appreciate it if you could make
the decision on which patch to pick up. Thanks a lot!
Jakub Kicinski April 11, 2023, 12:26 a.m. UTC | #14
On Tue,  4 Apr 2023 15:47:33 +0800 Liang Chen wrote:
> -	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> +	if ((to->pp_recycle != from->pp_recycle)
> +		|| (from->pp_recycle && skb_cloned(from)))
>  		return false;

It should be formatted like this:

	if (to->pp_recycle != from->pp_recycle ||
	    (from->pp_recycle && skb_cloned(from)))
Liang Chen April 12, 2023, 7:27 a.m. UTC | #15
Sure. I have addressed it and submitted the updated patch for your
review as v3. Thank you for pointing it out.

Thanks,
Liang

On Tue, Apr 11, 2023 at 8:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  4 Apr 2023 15:47:33 +0800 Liang Chen wrote:
> > -     if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> > +     if ((to->pp_recycle != from->pp_recycle)
> > +             || (from->pp_recycle && skb_cloned(from)))
> >               return false;
>
> It should be formatted like this:
>
>         if (to->pp_recycle != from->pp_recycle ||
>             (from->pp_recycle && skb_cloned(from)))
Jakub Kicinski April 12, 2023, 1:58 p.m. UTC | #16
On Wed, 12 Apr 2023 15:27:13 +0800 Liang Chen wrote:
> Sure. I have addressed it and submitted the updated patch for your
> review as v3. Thank you for pointing it out.

I know, I don't understand why you have to send this note, I can see
the patch. And it's still in patchwork which as you know (as I trust
you read our process documentation) means it's going to be reviewed.
The judeo-christiano-muslim world is going thru its spring celebrations
so the reviews will be slower than usual. 

Please be patient.

And please don't top post.
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 050a875d09c5..9be23ece5f03 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5598,17 +5598,14 @@  bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 		return false;
 
 	/* In general, avoid mixing slab allocated and page_pool allocated
-	 * pages within the same SKB. However when @to is not pp_recycle and
-	 * @from is cloned, we can transition frag pages from page_pool to
-	 * reference counted.
-	 *
-	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
-	 * @from is cloned, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
-	 * references for cloned SKBs at the moment that would result in
-	 * inconsistent reference counts.
+	 * pages within the same SKB. However don't allow coalescing two
+	 * pp_recycle SKBs if @from is cloned, in case the SKB is using
+	 * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only
+	 * take full page references for cloned SKBs at the moment that would
+	 * result in inconsistent reference counts.
 	 */
-	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
+	if ((to->pp_recycle != from->pp_recycle)
+		|| (from->pp_recycle && skb_cloned(from)))
 		return false;
 
 	if (len <= skb_tailroom(to)) {