Message ID | 20230404074733.22869-1-liangchen.linux@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | skbuff: Fix a race between coalescing and releasing SKBs | expand |
On 2023/4/4 15:47, Liang Chen wrote: > Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page > and page pool page when @from is cloned, i.e. > > to->pp_recycle --> false > from->pp_recycle --> true > skb_cloned(from) --> true > > However, it actually requires skb_cloned(@from) to hold true until > coalescing finishes in this situation. If the other cloned SKB is > released while the merging is in process, from_shinfo->nr_frags will be > set to 0 toward the end of the function, causing the increment of frag > page _refcount to be unexpectedly skipped resulting in inconsistent > reference counts. Later when SKB(@to) is released, it frees the page > directly even though the page pool page is still in use, leading to > use-after-free or double-free errors. So it should be prohibitted. Nice catch. > > The double-free error message below prompted us to investigate: > BUG: Bad page state in process swapper/1 pfn:0e0d1 > page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000 > index:0x2 pfn:0xe0d1 > flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) > raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000 > raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000 > page dumped because: nonzero _refcount > > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 6.2.0+ > Call Trace: > <IRQ> > dump_stack_lvl+0x32/0x50 > bad_page+0x69/0xf0 > free_pcp_prepare+0x260/0x2f0 > free_unref_page+0x20/0x1c0 > skb_release_data+0x10b/0x1a0 > napi_consume_skb+0x56/0x150 > net_rx_action+0xf0/0x350 > ? __napi_schedule+0x79/0x90 > __do_softirq+0xc8/0x2b1 > __irq_exit_rcu+0xb9/0xf0 > common_interrupt+0x82/0xa0 > </IRQ> > <TASK> > asm_common_interrupt+0x22/0x40 > RIP: 0010:default_idle+0xb/0x20 A Fixes tag is needed for the backport of the fix. Fixes: 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") And target the net branch for bugfix: [PATCH net] skbuff: Fix a race between coalescing and releasing SKBs Otherwise, it looks good to me: Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > --- > net/core/skbuff.c | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 050a875d09c5..9be23ece5f03 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > return false; > > /* In general, avoid mixing slab allocated and page_pool allocated > - * pages within the same SKB. However when @to is not pp_recycle and > - * @from is cloned, we can transition frag pages from page_pool to > - * reference counted. > - * > - * On the other hand, don't allow coalescing two pp_recycle SKBs if > - * @from is cloned, in case the SKB is using page_pool fragment > - * references (PP_FLAG_PAGE_FRAG). Since we only take full page > - * references for cloned SKBs at the moment that would result in > - * inconsistent reference counts. > + * pages within the same SKB. However don't allow coalescing two > + * pp_recycle SKBs if @from is cloned, in case the SKB is using > + * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only > + * take full page references for cloned SKBs at the moment that would > + * result in inconsistent reference counts. > */ > - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) > + if ((to->pp_recycle != from->pp_recycle) > + || (from->pp_recycle && skb_cloned(from))) > return false; > > if (len <= skb_tailroom(to)) { >
On Tue, 2023-04-04 at 15:47 +0800, Liang Chen wrote: > Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page > and page pool page when @from is cloned, i.e. > > to->pp_recycle --> false > from->pp_recycle --> true > skb_cloned(from) --> true > > However, it actually requires skb_cloned(@from) to hold true until > coalescing finishes in this situation. If the other cloned SKB is > released while the merging is in process, from_shinfo->nr_frags will be > set to 0 toward the end of the function, causing the increment of frag > page _refcount to be unexpectedly skipped resulting in inconsistent > reference counts. Later when SKB(@to) is released, it frees the page > directly even though the page pool page is still in use, leading to > use-after-free or double-free errors. So it should be prohibitted. > > The double-free error message below prompted us to investigate: > BUG: Bad page state in process swapper/1 pfn:0e0d1 > page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000 > index:0x2 pfn:0xe0d1 > flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) > raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000 > raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000 > page dumped because: nonzero _refcount > > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 6.2.0+ > Call Trace: > <IRQ> > dump_stack_lvl+0x32/0x50 > bad_page+0x69/0xf0 > free_pcp_prepare+0x260/0x2f0 > free_unref_page+0x20/0x1c0 > skb_release_data+0x10b/0x1a0 > napi_consume_skb+0x56/0x150 > net_rx_action+0xf0/0x350 > ? __napi_schedule+0x79/0x90 > __do_softirq+0xc8/0x2b1 > __irq_exit_rcu+0xb9/0xf0 > common_interrupt+0x82/0xa0 > </IRQ> > <TASK> > asm_common_interrupt+0x22/0x40 > RIP: 0010:default_idle+0xb/0x20 > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > --- > net/core/skbuff.c | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 050a875d09c5..9be23ece5f03 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > return false; > > /* In general, avoid mixing slab allocated and page_pool allocated > - * pages within the same SKB. However when @to is not pp_recycle and > - * @from is cloned, we can transition frag pages from page_pool to > - * reference counted. > - * > - * On the other hand, don't allow coalescing two pp_recycle SKBs if > - * @from is cloned, in case the SKB is using page_pool fragment > - * references (PP_FLAG_PAGE_FRAG). Since we only take full page > - * references for cloned SKBs at the moment that would result in > - * inconsistent reference counts. > + * pages within the same SKB. However don't allow coalescing two > + * pp_recycle SKBs if @from is cloned, in case the SKB is using > + * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only > + * take full page references for cloned SKBs at the moment that would > + * result in inconsistent reference counts. > */ > - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) > + if ((to->pp_recycle != from->pp_recycle) > + || (from->pp_recycle && skb_cloned(from))) > return false; > > if (len <= skb_tailroom(to)) { I'm not quite sure I agree with the fix. Couldn't we just modify the check further down that does: if (!skb_cloned(from)) from_shinfo->nr_frags = 0; And instead just make that: if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) from_shinfo->nr_frags = 0; With that we would retain the existing behavior and in the case of cloned from frames we would take the references and let the original from skb freed to take care of pulling the pages from the page pool.
On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > I'm not quite sure I agree with the fix. Couldn't we just modify the > check further down that does: > > if (!skb_cloned(from)) > from_shinfo->nr_frags = 0; > > And instead just make that: > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > from_shinfo->nr_frags = 0; > > With that we would retain the existing behavior and in the case of > cloned from frames we would take the references and let the original > from skb freed to take care of pulling the pages from the page pool. Sounds like a better fix, indeed. But this sort of code will require another fat comment above to explain why. This: if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) is much easier to understand, no? We should at least include that in the explanatory comment, I reckon...
On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > check further down that does: > > > > if (!skb_cloned(from)) > > from_shinfo->nr_frags = 0; > > > > And instead just make that: > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > from_shinfo->nr_frags = 0; > > > > With that we would retain the existing behavior and in the case of > > cloned from frames we would take the references and let the original > > from skb freed to take care of pulling the pages from the page pool. > > Sounds like a better fix, indeed. But this sort of code will require > another fat comment above to explain why. This: > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > is much easier to understand, no? > > We should at least include that in the explanatory comment, I reckon... Sure, the idea of dealing with the case where @from transitioned into non cloned skb in the function retains the existing behavior, and gives more opportunities to coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough here. I will take a closer look at the code path for the fragstolen case before making v2 patch - If @from transitioned into non cloned skb before "if (skb_head_is_locked(from))" Thanks for the reviews.
On Wed, 5 Apr 2023 16:18:47 +0800 Liang Chen wrote: > > Sounds like a better fix, indeed. But this sort of code will require > > another fat comment above to explain why. This: > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > is much easier to understand, no? > > > > We should at least include that in the explanatory comment, I reckon... > > Sure, the idea of dealing with the case where @from transitioned into non cloned > skb in the function retains the existing behavior, and gives more > opportunities to > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > here. Well, that's pretty much what Alex suggested minus the optimization he put in for "was never cloned" which is probably worth having. So if you're gonna do this just use his code. My point was that !from->pp_recycle requires the reader to understand the relationship between this check and the previous condition at entry. While to->pp_recycle == from->pp_recycle seems much more obvious to me - directly shifting frags between skbs with different refcount styles is dangerous. Maybe it's just me, so whatever. Make sure you write a good comment. > I will take a closer look at the code path for the fragstolen case > before making v2 > patch - If @from transitioned into non cloned skb before "if > (skb_head_is_locked(from))" > > Thanks for the reviews.
On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > > check further down that does: > > > > > > if (!skb_cloned(from)) > > > from_shinfo->nr_frags = 0; > > > > > > And instead just make that: > > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > > from_shinfo->nr_frags = 0; > > > > > > With that we would retain the existing behavior and in the case of > > > cloned from frames we would take the references and let the original > > > from skb freed to take care of pulling the pages from the page pool. > > > > Sounds like a better fix, indeed. But this sort of code will require > > another fat comment above to explain why. This: > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > is much easier to understand, no? > > > > We should at least include that in the explanatory comment, I reckon... > > Sure, the idea of dealing with the case where @from transitioned into non cloned > skb in the function retains the existing behavior, and gives more > opportunities to > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > here. > I will take a closer look at the code path for the fragstolen case > before making v2 > patch - If @from transitioned into non cloned skb before "if > (skb_head_is_locked(from))" > > Thanks for the reviews. Actually I am not sure that works now that I look at it closer. The problem with using (!skb_cloned(from) && !from->pp_recycle) is that it breaks the case where both from and to are pp_recycle without being cloned. So it probably needs to be something actually the setup Jakub suggested would probably work better: if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) Basically we just have to guarantee that if we are copying frags over the frag destructor is the same for the origin and destination. Otherwise we can take a reference and convert them to being reference counted.
On Wed, Apr 5, 2023 at 10:50 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 5 Apr 2023 16:18:47 +0800 Liang Chen wrote: > > > Sounds like a better fix, indeed. But this sort of code will require > > > another fat comment above to explain why. This: > > > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > is much easier to understand, no? > > > > > > We should at least include that in the explanatory comment, I reckon... > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned > > skb in the function retains the existing behavior, and gives more > > opportunities to > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > > here. > > Well, that's pretty much what Alex suggested minus the optimization he > put in for "was never cloned" which is probably worth having. So if > you're gonna do this just use his code. > > My point was that !from->pp_recycle requires the reader to understand > the relationship between this check and the previous condition at entry. > While to->pp_recycle == from->pp_recycle seems much more obvious to me - > directly shifting frags between skbs with different refcount styles is > dangerous. > Yeah, I agree with the point that to->pp_recycle == from->pp_recycle is easier to understand, and will use it in the next iteration of the patch. Thanks! > Maybe it's just me, so whatever. > Make sure you write a good comment. Sure. > > > I will take a closer look at the code path for the fragstolen case > > before making v2 > > patch - If @from transitioned into non cloned skb before "if > > (skb_head_is_locked(from))" > > I took a closer look at the code path for the "fragstolen" case, and it indeed requires a special handle for the situation addressed in this patch. Something like, if( to->pp_recycle != from->pp_recycle ) get_page(page); before "*fragstolen = true;". But this makes the logic a bit complicated. Anyway, I will include the logic in the v2 patch. Let's see if it looks better. > > Thanks for the reviews.
On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > > > check further down that does: > > > > > > > > if (!skb_cloned(from)) > > > > from_shinfo->nr_frags = 0; > > > > > > > > And instead just make that: > > > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > > > from_shinfo->nr_frags = 0; > > > > > > > > With that we would retain the existing behavior and in the case of > > > > cloned from frames we would take the references and let the original > > > > from skb freed to take care of pulling the pages from the page pool. > > > > > > Sounds like a better fix, indeed. But this sort of code will require > > > another fat comment above to explain why. This: > > > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > is much easier to understand, no? > > > > > > We should at least include that in the explanatory comment, I reckon... > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned > > skb in the function retains the existing behavior, and gives more > > opportunities to > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > > here. > > I will take a closer look at the code path for the fragstolen case > > before making v2 > > patch - If @from transitioned into non cloned skb before "if > > (skb_head_is_locked(from))" > > > > Thanks for the reviews. > > Actually I am not sure that works now that I look at it closer. The > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it > breaks the case where both from and to are pp_recycle without being > cloned. Yeah, it would break that case. Thanks! > > So it probably needs to be something actually the setup Jakub > suggested would probably work better: > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > I agree. That's better. > Basically we just have to guarantee that if we are copying frags over > the frag destructor is the same for the origin and destination. > Otherwise we can take a reference and convert them to being reference > counted.
On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > > > > check further down that does: > > > > > > > > > > if (!skb_cloned(from)) > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > And instead just make that: > > > > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > With that we would retain the existing behavior and in the case of > > > > > cloned from frames we would take the references and let the original > > > > > from skb freed to take care of pulling the pages from the page pool. > > > > > > > > Sounds like a better fix, indeed. But this sort of code will require > > > > another fat comment above to explain why. This: > > > > > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > > > is much easier to understand, no? > > > > > > > > We should at least include that in the explanatory comment, I reckon... > > > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned > > > skb in the function retains the existing behavior, and gives more > > > opportunities to > > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > > > here. > > > I will take a closer look at the code path for the fragstolen case > > > before making v2 > > > patch - If @from transitioned into non cloned skb before "if > > > (skb_head_is_locked(from))" > > > > > > Thanks for the reviews. > > > > Actually I am not sure that works now that I look at it closer. The > > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it > > breaks the case where both from and to are pp_recycle without being > > cloned. > > Yeah, it would break that case. Thanks! > > > > So it probably needs to be something actually the setup Jakub > > suggested would probably work better: > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > I agree. That's better. Same feeling on my side. I prefer not trying to merge mixed pp_recycle skbs "just because we could" at the expense of adding more code in a fast path.
On Thu, 6 Apr 2023 at 12:56, Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > > > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > > > > > check further down that does: > > > > > > > > > > > > if (!skb_cloned(from)) > > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > > > And instead just make that: > > > > > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > > > With that we would retain the existing behavior and in the case of > > > > > > cloned from frames we would take the references and let the original > > > > > > from skb freed to take care of pulling the pages from the page pool. > > > > > > > > > > Sounds like a better fix, indeed. But this sort of code will require > > > > > another fat comment above to explain why. This: > > > > > > > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > > > > > is much easier to understand, no? > > > > > > > > > > We should at least include that in the explanatory comment, I reckon... > > > > > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned > > > > skb in the function retains the existing behavior, and gives more > > > > opportunities to > > > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > > > > here. > > > > I will take a closer look at the code path for the fragstolen case > > > > before making v2 > > > > patch - If @from transitioned into non cloned skb before "if > > > > (skb_head_is_locked(from))" > > > > > > > > Thanks for the reviews. > > > > > > Actually I am not sure that works now that I look at it closer. The > > > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it > > > breaks the case where both from and to are pp_recycle without being > > > cloned. > > > > Yeah, it would break that case. Thanks! > > > > > > So it probably needs to be something actually the setup Jakub > > > suggested would probably work better: > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > > I agree. That's better. > > Same feeling on my side. > I prefer not trying to merge mixed pp_recycle skbs "just because we > could" at the expense > of adding more code in a fast path. +1 here. The intention of recycling was to affect the normal path as less as possible. On top of that, we've some amount of race conditions over the years, trying to squeeze more performance with similar tricks. I'd much rather be safe here, since recycling by itself is a great performance boost Regards /Ilias
On Thu, Apr 6, 2023 at 6:46 PM Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > On Thu, 6 Apr 2023 at 12:56, Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Apr 6, 2023 at 5:28 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > > > On Wed, Apr 5, 2023 at 11:06 PM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > On Wed, Apr 5, 2023 at 1:19 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > > > > > > > On Wed, Apr 5, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > > > > > On Tue, 04 Apr 2023 08:51:18 -0700 Alexander H Duyck wrote: > > > > > > > I'm not quite sure I agree with the fix. Couldn't we just modify the > > > > > > > check further down that does: > > > > > > > > > > > > > > if (!skb_cloned(from)) > > > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > > > > > And instead just make that: > > > > > > > if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle)) > > > > > > > from_shinfo->nr_frags = 0; > > > > > > > > > > > > > > With that we would retain the existing behavior and in the case of > > > > > > > cloned from frames we would take the references and let the original > > > > > > > from skb freed to take care of pulling the pages from the page pool. > > > > > > > > > > > > Sounds like a better fix, indeed. But this sort of code will require > > > > > > another fat comment above to explain why. This: > > > > > > > > > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > > > > > > > is much easier to understand, no? > > > > > > > > > > > > We should at least include that in the explanatory comment, I reckon... > > > > > > > > > > Sure, the idea of dealing with the case where @from transitioned into non cloned > > > > > skb in the function retains the existing behavior, and gives more > > > > > opportunities to > > > > > coalesce skbs. And it seems (!skb_cloned(from) && !from->pp_recycle) is enough > > > > > here. > > > > > I will take a closer look at the code path for the fragstolen case > > > > > before making v2 > > > > > patch - If @from transitioned into non cloned skb before "if > > > > > (skb_head_is_locked(from))" > > > > > > > > > > Thanks for the reviews. > > > > > > > > Actually I am not sure that works now that I look at it closer. The > > > > problem with using (!skb_cloned(from) && !from->pp_recycle) is that it > > > > breaks the case where both from and to are pp_recycle without being > > > > cloned. > > > > > > Yeah, it would break that case. Thanks! > > > > > > > > So it probably needs to be something actually the setup Jakub > > > > suggested would probably work better: > > > > if (to->pp_recycle == from->pp_recycle && !skb_cloned(from)) > > > > > > > > > > I agree. That's better. > > > > Same feeling on my side. > > I prefer not trying to merge mixed pp_recycle skbs "just because we > > could" at the expense > > of adding more code in a fast path. > > +1 here. The intention of recycling was to affect the normal path as > less as possible. On top of that, we've some amount of race > conditions over the years, trying to squeeze more performance with > similar tricks. I'd much rather be safe here, since recycling by > itself is a great performance boost > > Regards > /Ilias Sorry, I didn't check my inbox before sending out the v2 patch. Yeah, It is a bit complicated as we expected. The patch is sent out. Please take a look to see if it is the way to go, or We should stay with the current patch for simplicity reasons. Thanks!
On Thu, 6 Apr 2023 19:54:23 +0800 Liang Chen wrote: > > > Same feeling on my side. > > > I prefer not trying to merge mixed pp_recycle skbs "just because we > > > could" at the expense > > > of adding more code in a fast path. > > > > +1 here. The intention of recycling was to affect the normal path as > > less as possible. On top of that, we've some amount of race > > conditions over the years, trying to squeeze more performance with > > similar tricks. I'd much rather be safe here, since recycling by > > itself is a great performance boost > > Sorry, I didn't check my inbox before sending out the v2 patch. I can discard v2 from patchwork, let's continue the conversation here. > Yeah, It is a bit complicated as we expected. The patch is sent out. > Please take a look to see if it is the way to go, or We should stay > with the current patch for simplicity reasons. Thanks! Sounds like you know what Eric and Ilias agreed with, I'm a bit confused.. are we basically going back to v1? (hopefully with coding style fixed)
On Thu, Apr 6, 2023 at 10:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 6 Apr 2023 19:54:23 +0800 Liang Chen wrote: > > > > Same feeling on my side. > > > > I prefer not trying to merge mixed pp_recycle skbs "just because we > > > > could" at the expense > > > > of adding more code in a fast path. > > > > > > +1 here. The intention of recycling was to affect the normal path as > > > less as possible. On top of that, we've some amount of race > > > conditions over the years, trying to squeeze more performance with > > > similar tricks. I'd much rather be safe here, since recycling by > > > itself is a great performance boost > > > > Sorry, I didn't check my inbox before sending out the v2 patch. > > I can discard v2 from patchwork, let's continue the conversation > here. > > > Yeah, It is a bit complicated as we expected. The patch is sent out. > > Please take a look to see if it is the way to go, or We should stay > > with the current patch for simplicity reasons. Thanks! > > Sounds like you know what Eric and Ilias agreed with, I'm a bit > confused.. are we basically going back to v1? (hopefully with coding > style fixed) Sorry for making the confusion. I just felt the v2 patch complicated the code a bit. I am not quite sure if the performance gain justifies the added complexity, and would really appreciate it if you could make the decision on which patch to pick up. Thanks a lot!
On Tue, 4 Apr 2023 15:47:33 +0800 Liang Chen wrote: > - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) > + if ((to->pp_recycle != from->pp_recycle) > + || (from->pp_recycle && skb_cloned(from))) > return false; It should be formatted like this: if (to->pp_recycle != from->pp_recycle || (from->pp_recycle && skb_cloned(from)))
Sure. I have addressed it and submitted the updated patch for your review as v3. Thank you for pointing it out. Thanks, Liang On Tue, Apr 11, 2023 at 8:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 4 Apr 2023 15:47:33 +0800 Liang Chen wrote: > > - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) > > + if ((to->pp_recycle != from->pp_recycle) > > + || (from->pp_recycle && skb_cloned(from))) > > return false; > > It should be formatted like this: > > if (to->pp_recycle != from->pp_recycle || > (from->pp_recycle && skb_cloned(from)))
On Wed, 12 Apr 2023 15:27:13 +0800 Liang Chen wrote: > Sure. I have addressed it and submitted the updated patch for your > review as v3. Thank you for pointing it out. I know, I don't understand why you have to send this note, I can see the patch. And it's still in patchwork which as you know (as I trust you read our process documentation) means it's going to be reviewed. The judeo-christiano-muslim world is going thru its spring celebrations so the reviews will be slower than usual. Please be patient. And please don't top post.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 050a875d09c5..9be23ece5f03 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, return false; /* In general, avoid mixing slab allocated and page_pool allocated - * pages within the same SKB. However when @to is not pp_recycle and - * @from is cloned, we can transition frag pages from page_pool to - * reference counted. - * - * On the other hand, don't allow coalescing two pp_recycle SKBs if - * @from is cloned, in case the SKB is using page_pool fragment - * references (PP_FLAG_PAGE_FRAG). Since we only take full page - * references for cloned SKBs at the moment that would result in - * inconsistent reference counts. + * pages within the same SKB. However don't allow coalescing two + * pp_recycle SKBs if @from is cloned, in case the SKB is using + * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only + * take full page references for cloned SKBs at the moment that would + * result in inconsistent reference counts. */ - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) + if ((to->pp_recycle != from->pp_recycle) + || (from->pp_recycle && skb_cloned(from))) return false; if (len <= skb_tailroom(to)) {
Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page and page pool page when @from is cloned, i.e. to->pp_recycle --> false from->pp_recycle --> true skb_cloned(from) --> true However, it actually requires skb_cloned(@from) to hold true until coalescing finishes in this situation. If the other cloned SKB is released while the merging is in process, from_shinfo->nr_frags will be set to 0 toward the end of the function, causing the increment of frag page _refcount to be unexpectedly skipped resulting in inconsistent reference counts. Later when SKB(@to) is released, it frees the page directly even though the page pool page is still in use, leading to use-after-free or double-free errors. So it should be prohibitted. The double-free error message below prompted us to investigate: BUG: Bad page state in process swapper/1 pfn:0e0d1 page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0xe0d1 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000 raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 6.2.0+ Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 bad_page+0x69/0xf0 free_pcp_prepare+0x260/0x2f0 free_unref_page+0x20/0x1c0 skb_release_data+0x10b/0x1a0 napi_consume_skb+0x56/0x150 net_rx_action+0xf0/0x350 ? __napi_schedule+0x79/0x90 __do_softirq+0xc8/0x2b1 __irq_exit_rcu+0xb9/0xf0 common_interrupt+0x82/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0xb/0x20 Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- net/core/skbuff.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)