diff mbox series

[v5] skbuff: Fix a race between coalescing and releasing SKBs

Message ID 20230413090353.14448-1-liangchen.linux@gmail.com (mailing list archive)
State Accepted
Commit 0646dc31ca886693274df5749cd0c8c1eaaeb5ca
Delegated to: Netdev Maintainers
Headers show
Series [v5] skbuff: Fix a race between coalescing and releasing SKBs | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Liang Chen April 13, 2023, 9:03 a.m. UTC
Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
 <IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
Changes from v4:
- fixes some style issues
---
 net/core/skbuff.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Yunsheng Lin April 13, 2023, 11:42 a.m. UTC | #1
On 2023/4/13 17:03, Liang Chen wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> However, it actually requires skb_cloned(@from) to hold true until
> coalescing finishes in this situation. If the other cloned SKB is
> released while the merging is in process, from_shinfo->nr_frags will be
> set to 0 toward the end of the function, causing the increment of frag
> page _refcount to be unexpectedly skipped resulting in inconsistent
> reference counts. Later when SKB(@to) is released, it frees the page
> directly even though the page pool page is still in use, leading to
> use-after-free or double-free errors. So it should be prohibited.
> 
> The double-free error message below prompted us to investigate:
> BUG: Bad page state in process swapper/1  pfn:0e0d1
> page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0xe0d1
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
> raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
> Call Trace:
>  <IRQ>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_pcp_prepare+0x260/0x2f0
> free_unref_page+0x20/0x1c0
> skb_release_data+0x10b/0x1a0
> napi_consume_skb+0x56/0x150
> net_rx_action+0xf0/0x350
> ? __napi_schedule+0x79/0x90
> __do_softirq+0xc8/0x2b1
> __irq_exit_rcu+0xb9/0xf0
> common_interrupt+0x82/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x22/0x40
> RIP: 0010:default_idle+0xb/0x20
> 
> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")

I am not quite sure the above is right Fixes tag.
As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
to fix it, and it missed the case this patch is fixing, so we need another fix here.
Eric Dumazet April 13, 2023, 1:45 p.m. UTC | #2
On Thu, Apr 13, 2023 at 1:42 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/4/13 17:03, Liang Chen wrote:

> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>
> I am not quite sure the above is right Fixes tag.
> As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
> to fix it, and it missed the case this patch is fixing, so we need another fix here.

The Fixes: tag is more about pointing to stable teams how to deal with
backports.
There is no point giving the full chain, because this 'blamed' commit is enough.

If an old kernel does not contain this commit, there is no point trying harder.
Jakub Kicinski April 13, 2023, 4:41 p.m. UTC | #3
On Thu, 13 Apr 2023 17:03:53 +0800 Liang Chen wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.

I think Alex is out for spring celebrations so let me take this 
in for today's PR. Thanks!
patchwork-bot+netdevbpf@kernel.org April 13, 2023, 5:20 p.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 13 Apr 2023 17:03:53 +0800 you wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> [...]

Here is the summary with links:
  - [v5] skbuff: Fix a race between coalescing and releasing SKBs
    https://git.kernel.org/netdev/net/c/0646dc31ca88

You are awesome, thank you!
Yunsheng Lin April 14, 2023, 1:41 a.m. UTC | #5
On 2023/4/13 21:45, Eric Dumazet wrote:
> On Thu, Apr 13, 2023 at 1:42 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/4/13 17:03, Liang Chen wrote:
> 
>>> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>>
>> I am not quite sure the above is right Fixes tag.
>> As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
>> to fix it, and it missed the case this patch is fixing, so we need another fix here.
> 
> The Fixes: tag is more about pointing to stable teams how to deal with
> backports.
> There is no point giving the full chain, because this 'blamed' commit is enough.
> 
> If an old kernel does not contain this commit, there is no point trying harder.

In that case, it may be better to point to
6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") too.

As even without fragment support, for below case:
to->pp_recycle    --> true
from->pp_recycle  --> true
skb_cloned(from)  --> true

It seems some page may be called with page_pool_release_page() twice if 'to' and
cloned skb of 'from' are freed concurrently, as there is data race between checking
page->pp_magic and clearing page->pp_magic.

Anyway, as it is merged already, I guess it is not really matter anymore.


> .
>
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 050a875d09c5..7b83410bbaae 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5597,18 +5597,18 @@  bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	if (skb_cloned(to))
 		return false;
 
-	/* In general, avoid mixing slab allocated and page_pool allocated
-	 * pages within the same SKB. However when @to is not pp_recycle and
-	 * @from is cloned, we can transition frag pages from page_pool to
-	 * reference counted.
-	 *
-	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
-	 * @from is cloned, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
-	 * references for cloned SKBs at the moment that would result in
+	/* In general, avoid mixing page_pool and non-page_pool allocated
+	 * pages within the same SKB. Additionally avoid dealing with clones
+	 * containing page_pool pages, in case the SKB is using page_pool
+	 * fragment references (PP_FLAG_PAGE_FRAG). Since we only take full
+	 * page references for cloned SKBs at the moment that would result in
 	 * inconsistent reference counts.
+	 * In theory we could take full references if from is cloned and
+	 * !@to->pp_recycle but its tricky (due to potential race with the clone
+	 * disappearing) and rare, so not worth dealing with.
 	 */
-	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
+	if (to->pp_recycle != from->pp_recycle ||
+	    (from->pp_recycle && skb_cloned(from)))
 		return false;
 
 	if (len <= skb_tailroom(to)) {