Message ID | 20240304094950.761233-1-dtatulea@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] net: esp: fix bad handling of pages from page_pool | expand |
On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote: > When the skb is reorganized during esp_output (!esp->inline), the pages > coming from the original skb fragments are supposed to be released back > to the system through put_page. But if the skb fragment pages are > originating from a page_pool, calling put_page on them will trigger a > page_pool leak which will eventually result in a crash. So it just does: skb_shinfo(skb)->nr_frags = 1; and assumes that's equivalent to owning a page ref on all the frags? Fix looks more or less good, we would need a new wrapper to avoid build issues without PAGE_POOL, but I wonder if we wouldn't be better off changing the other side. Instead of "cutting off" the frags - walking them and dealing with various page types. Because Mina and co. will step onto this landmine as well.
On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote: > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote: > > When the skb is reorganized during esp_output (!esp->inline), the pages > > coming from the original skb fragments are supposed to be released back > > to the system through put_page. But if the skb fragment pages are > > originating from a page_pool, calling put_page on them will trigger a > > page_pool leak which will eventually result in a crash. > > So it just does: skb_shinfo(skb)->nr_frags = 1; > and assumes that's equivalent to owning a page ref on all the frags? > My understanding is different: it sets nr_frags to 1 because it's swapping out the old page frag in fragment 0 with the new xfrag page frag and will use this "new" skb from here. It does take a page reference for the xfrag page frag. > Fix looks more or less good, we would need a new wrapper to avoid > build issues without PAGE_POOL, > Ack. Which component would be best location for this wrapper: page_pool? > but I wonder if we wouldn't be better > off changing the other side. Instead of "cutting off" the frags - > walking them and dealing with various page types. Because Mina and co. > will step onto this landmine as well. The page frags are still stored and used in the sg scatterlist. If we release them at the moment when the skb is "cut off", the pages in the sg will be invalid. At least that's my understanding. Thanks, Dragos
On Wed, 6 Mar 2024 13:05:14 +0000 Dragos Tatulea wrote: > On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote: > > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote: > > > When the skb is reorganized during esp_output (!esp->inline), the pages > > > coming from the original skb fragments are supposed to be released back > > > to the system through put_page. But if the skb fragment pages are > > > originating from a page_pool, calling put_page on them will trigger a > > > page_pool leak which will eventually result in a crash. > > > > So it just does: skb_shinfo(skb)->nr_frags = 1; > > and assumes that's equivalent to owning a page ref on all the frags? > > > My understanding is different: it sets nr_frags to 1 because it's swapping out > the old page frag in fragment 0 with the new xfrag page frag and will use this > "new" skb from here. It does take a page reference for the xfrag page frag. Same understanding, I'm just bad at explaining :) > > Fix looks more or less good, we would need a new wrapper to avoid > > build issues without PAGE_POOL, > > > Ack. Which component would be best location for this wrapper: page_pool? Hm, that's a judgment call. Part of me wants to put it next to napi_frag_unref(), since we basically need to factor out the insides of this function. When you post the patch the page pool crowd will give us their opinions. > > but I wonder if we wouldn't be better > > off changing the other side. Instead of "cutting off" the frags - > > walking them and dealing with various page types. Because Mina and co. > > will step onto this landmine as well. > The page frags are still stored and used in the sg scatterlist. If we release > them at the moment when the skb is "cut off", the pages in the sg will be > invalid. At least that's my understanding. I was thinking something along the lines of: for each frag() if (is_pp_page()) { get_page(); page_pool_unref_page(1); } so that it's trivial to insert another check for "is this a zero-copy" page in there, and error our. But on reflection the zero copy check may be better placed in __skb_to_sgvec(), so ignore this. Just respin what you got with a new helper.
On Wed, 2024-03-06 at 07:22 -0800, Jakub Kicinski wrote: > On Wed, 6 Mar 2024 13:05:14 +0000 Dragos Tatulea wrote: > > On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote: > > > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote: > > > > When the skb is reorganized during esp_output (!esp->inline), the pages > > > > coming from the original skb fragments are supposed to be released back > > > > to the system through put_page. But if the skb fragment pages are > > > > originating from a page_pool, calling put_page on them will trigger a > > > > page_pool leak which will eventually result in a crash. > > > > > > So it just does: skb_shinfo(skb)->nr_frags = 1; > > > and assumes that's equivalent to owning a page ref on all the frags? > > > > > My understanding is different: it sets nr_frags to 1 because it's swapping out > > the old page frag in fragment 0 with the new xfrag page frag and will use this > > "new" skb from here. It does take a page reference for the xfrag page frag. > > Same understanding, I'm just bad at explaining :) > > > > Fix looks more or less good, we would need a new wrapper to avoid > > > build issues without PAGE_POOL, > > > > > Ack. Which component would be best location for this wrapper: page_pool? > > Hm, that's a judgment call. > Part of me wants to put it next to napi_frag_unref(), since we > basically need to factor out the insides of this function. > When you post the patch the page pool crowd will give us > their opinions. > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not set? Regarding stable would I need to send a separate fix that does the raw pp page check without the API? > > > but I wonder if we wouldn't be better > > > off changing the other side. Instead of "cutting off" the frags - > > > walking them and dealing with various page types. Because Mina and co. > > > will step onto this landmine as well. > > The page frags are still stored and used in the sg scatterlist. If we release > > them at the moment when the skb is "cut off", the pages in the sg will be > > invalid. At least that's my understanding. > > I was thinking something along the lines of: > > for each frag() > if (is_pp_page()) { > get_page(); > page_pool_unref_page(1); > } > > so that it's trivial to insert another check for "is this a zero-copy" > page in there, and error our. But on reflection the zero copy check may > be better placed in __skb_to_sgvec(), so ignore this. Just respin > what you got with a new helper. > Ignored. I was hoping we wouldn't go in that direction :). Thanks, Dragos
On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote: > > Hm, that's a judgment call. > > Part of me wants to put it next to napi_frag_unref(), since we > > basically need to factor out the insides of this function. > > When you post the patch the page pool crowd will give us > > their opinions. > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not > set? Without LTO it may still be a function call. Plus, subjectively, I think that it's a bit too much logic to encode in the caller (you must also check skb->pp_recycle, AFAIU) Maybe we should make skb_pp_recycle() take struct page and move it to skbuff.h ? Rename it to skb_page_unref() ? > Regarding stable would I need to send a separate fix that does the raw pp page > check without the API? You can put them in one patch, I reckon.
On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote: > > > Hm, that's a judgment call. > > > Part of me wants to put it next to napi_frag_unref(), since we > > > basically need to factor out the insides of this function. > > > When you post the patch the page pool crowd will give us > > > their opinions. > > > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not > > set? > > Without LTO it may still be a function call. > Plus, subjectively, I think that it's a bit too much logic to encode in > the caller (you must also check skb->pp_recycle, AFAIU) > Maybe we should make skb_pp_recycle() take struct page and move it to > skbuff.h ? Rename it to skb_page_unref() ? > Does the caller need to check skb->pp_recycle? pp_recycle seems like a redundant bit. We can tell whether the page is pp by checking is_pp_page(page). the pages in the frag must be pp pages when skb->pp_recycle is set and must be non pp pages when the skb->pp_recycle is not set, so it all seems redundant to me. My fix would be something like: diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d577e0bee18d..cc737b7b9860 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, bool napi_pp_put_page(struct page *page, bool napi_safe); static inline void -napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) +napi_page_unref(struct page *page, bool napi_safe) { - struct page *page = skb_frag_page(frag); - #ifdef CONFIG_PAGE_POOL - if (recycle && napi_pp_put_page(page, napi_safe)) + if (napi_pp_put_page(page, napi_safe)) return; #endif put_page(page); } +static inline void +napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) +{ + struct page *page = skb_frag_page(frag); + + DEBUG_NET_WARN_ON(recycle != is_pp_page(page)); + + napi_page_unref(page); +} + And then use napi_page_unref() in the callers to handle page pool & non-page pool gracefully without leaking page pool internals to the callers. > > Regarding stable would I need to send a separate fix that does the raw pp page > > check without the API? > > You can put them in one patch, I reckon. -- Thanks, Mina
On Wed, 2024-03-06 at 08:40 -0800, Mina Almasry wrote: > On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote: > > > > Hm, that's a judgment call. > > > > Part of me wants to put it next to napi_frag_unref(), since we > > > > basically need to factor out the insides of this function. > > > > When you post the patch the page pool crowd will give us > > > > their opinions. > > > > > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not > > > set? > > > > Without LTO it may still be a function call. > > Plus, subjectively, I think that it's a bit too much logic to encode in > > the caller (you must also check skb->pp_recycle, AFAIU) > > Maybe we should make skb_pp_recycle() take struct page and move it to > > skbuff.h ? Rename it to skb_page_unref() ? > > > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a > redundant bit. We can tell whether the page is pp by checking > is_pp_page(page). the pages in the frag must be pp pages when > skb->pp_recycle is set and must be non pp pages when the > skb->pp_recycle is not set, so it all seems redundant to me. > AFAIU we don't have to check for pp_recycle, at least not in this specific case. > My fix would be something like: > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index d577e0bee18d..cc737b7b9860 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool > *pool, struct sk_buff **pskb, > bool napi_pp_put_page(struct page *page, bool napi_safe); > > static inline void > -napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) > +napi_page_unref(struct page *page, bool napi_safe) > { > - struct page *page = skb_frag_page(frag); > - > #ifdef CONFIG_PAGE_POOL > - if (recycle && napi_pp_put_page(page, napi_safe)) > + if (napi_pp_put_page(page, napi_safe)) > return; > #endif > put_page(page); > } > > +static inline void > +napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) > +{ > + struct page *page = skb_frag_page(frag); > + > + DEBUG_NET_WARN_ON(recycle != is_pp_page(page)); > + > + napi_page_unref(page); > +} > + > > And then use napi_page_unref() in the callers to handle page pool & > non-page pool gracefully without leaking page pool internals to the > callers. > We'd also need to add is_pp_page() in the header with the changes above... On that line of thought, unless these new APIs are useful for other use-cases, why not keep it simple: - Move is_pp_page() to skbuff.h. - Do a simple is_pp_page(page) ? page_pool_put_full_page(page):put_page(page) in the caller? Checking skb->pp_recycle would not be needed. Thanks, Dragos > > > Regarding stable would I need to send a separate fix that does the raw pp page > > > check without the API? > > > > You can put them in one patch, I reckon. >
On Wed, Mar 6, 2024 at 9:10 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > On Wed, 2024-03-06 at 08:40 -0800, Mina Almasry wrote: > > On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote: > > > > > Hm, that's a judgment call. > > > > > Part of me wants to put it next to napi_frag_unref(), since we > > > > > basically need to factor out the insides of this function. > > > > > When you post the patch the page pool crowd will give us > > > > > their opinions. > > > > > > > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not > > > > set? > > > > > > Without LTO it may still be a function call. > > > Plus, subjectively, I think that it's a bit too much logic to encode in > > > the caller (you must also check skb->pp_recycle, AFAIU) > > > Maybe we should make skb_pp_recycle() take struct page and move it to > > > skbuff.h ? Rename it to skb_page_unref() ? > > > > > > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a > > redundant bit. We can tell whether the page is pp by checking > > is_pp_page(page). the pages in the frag must be pp pages when > > skb->pp_recycle is set and must be non pp pages when the > > skb->pp_recycle is not set, so it all seems redundant to me. > > > AFAIU we don't have to check for pp_recycle, at least not in this specific case. > > > My fix would be something like: > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index d577e0bee18d..cc737b7b9860 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool > > *pool, struct sk_buff **pskb, > > bool napi_pp_put_page(struct page *page, bool napi_safe); > > > > static inline void > > -napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) > > +napi_page_unref(struct page *page, bool napi_safe) > > { > > - struct page *page = skb_frag_page(frag); > > - > > #ifdef CONFIG_PAGE_POOL > > - if (recycle && napi_pp_put_page(page, napi_safe)) > > + if (napi_pp_put_page(page, napi_safe)) > > return; > > #endif > > put_page(page); > > } > > > > +static inline void > > +napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) > > +{ > > + struct page *page = skb_frag_page(frag); > > + > > + DEBUG_NET_WARN_ON(recycle != is_pp_page(page)); > > + > > + napi_page_unref(page); > > +} > > + > > > > And then use napi_page_unref() in the callers to handle page pool & > > non-page pool gracefully without leaking page pool internals to the > > callers. > > > We'd also need to add is_pp_page() in the header with the changes above... > Gah, I did not realize skbuff.h doesn't have is_pp_page(). Sorry! > On that line of thought, unless these new APIs are useful for other use-cases, > why not keep it simple: > - Move is_pp_page() to skbuff.h. I prefer this. is_pp_page() is a one-liner anyway. > - Do a simple is_pp_page(page) ? page_pool_put_full_page(page):put_page(page) in > the caller? Checking skb->pp_recycle would not be needed. > IMHO page pool internals should not leak to the callers like this. There should be a helper that does the right thing for pp & non-pp pages so that the caller can use it without worrying about subtle pp concepts that they may miss at some callsites, but I'm fine either way if there is strong disagreement from others. > Thanks, > Dragos > > > > > Regarding stable would I need to send a separate fix that does the raw pp page > > > > check without the API? > > > > > > You can put them in one patch, I reckon. > > >
On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote: > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a > > redundant bit. We can tell whether the page is pp by checking > > is_pp_page(page). the pages in the frag must be pp pages when > > skb->pp_recycle is set and must be non pp pages when the > > skb->pp_recycle is not set, so it all seems redundant to me. > > > AFAIU we don't have to check for pp_recycle, at least not in this specific case. Definitely not something we assuming in a fix that needs to go to stable. So far, AFAIU, it's legal to have an skb without skb->pp_recycle set, which holds full page refs to a PP page.
On Wed, Mar 6, 2024 at 9:41 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote: > > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a > > > redundant bit. We can tell whether the page is pp by checking > > > is_pp_page(page). the pages in the frag must be pp pages when > > > skb->pp_recycle is set and must be non pp pages when the > > > skb->pp_recycle is not set, so it all seems redundant to me. > > > > > AFAIU we don't have to check for pp_recycle, at least not in this specific case. > > Definitely not something we assuming in a fix that needs to go > to stable. > > So far, AFAIU, it's legal to have an skb without skb->pp_recycle > set, which holds full page refs to a PP page. Interesting. I apologize then I did not realize this combination is legal. But I have a follow up question: what is the ref code supposed to do in this combination? AFAIU: - skb->pp_recycle && is_pp_page() ref via page_pool_ref_page() unref via page_pool_unref_page() - !skb->pp_recycle && !is_pp_page() ref via get_page() unref via put_page() - !skb->pp_recycle && is_pp_page() ref via ? unref via? Also is this combination also legal you think? and if so what to do? - skb->pp_recycle && !is_pp_page() ref via ? unref via? I'm asking because I'm starting to wonder if this patch has some issue in it: https://patchwork.kernel.org/project/netdevbpf/patch/20231215033011.12107-4-liangchen.linux@gmail.com/ Because in this patch, if we have a !skb->pp_recycle & is_pp_page() combination we end up doing in skb_try_coalesce(): ref via page_pool_ref_page() unref via put_page() via eventual napi_frag_unref() which seems like an issue, no?
On Wed, Mar 6, 2024 at 10:41 AM Mina Almasry <almasrymina@google.com> wrote: > > On Wed, Mar 6, 2024 at 9:41 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote: > > > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a > > > > redundant bit. We can tell whether the page is pp by checking > > > > is_pp_page(page). the pages in the frag must be pp pages when > > > > skb->pp_recycle is set and must be non pp pages when the > > > > skb->pp_recycle is not set, so it all seems redundant to me. > > > > > > > AFAIU we don't have to check for pp_recycle, at least not in this specific case. > > > > Definitely not something we assuming in a fix that needs to go > > to stable. > > > > So far, AFAIU, it's legal to have an skb without skb->pp_recycle > > set, which holds full page refs to a PP page. > > Interesting. I apologize then I did not realize this combination is > legal. But I have a follow up question: what is the ref code supposed > to do in this combination? AFAIU: > > - skb->pp_recycle && is_pp_page() > ref via page_pool_ref_page() > unref via page_pool_unref_page() > > - !skb->pp_recycle && !is_pp_page() > ref via get_page() > unref via put_page() > > - !skb->pp_recycle && is_pp_page() > ref via ? > unref via? > > Also is this combination also legal you think? and if so what to do? > - skb->pp_recycle && !is_pp_page() > ref via ? > unref via? > > I'm asking because I'm starting to wonder if this patch has some issue in it: > https://patchwork.kernel.org/project/netdevbpf/patch/20231215033011.12107-4-liangchen.linux@gmail.com/ > > Because in this patch, if we have a !skb->pp_recycle & is_pp_page() > combination we end up doing in skb_try_coalesce(): > ref via page_pool_ref_page() > unref via put_page() via eventual napi_frag_unref() > > which seems like an issue, no? > Gah, nevermind, skb_pp_frag_ref() actually returns -EINVAL if !skb->pp_recycle, and in the call site we do a skb_frag_ref() on this error, so all in all we end up doing a get_page/put_page pair. Sorry for the noise. So we're supposed to: - !skb->pp_recycle && is_pp_page() ref via get_page unref via put_page Very subtle stuff (for me at least). I'll try to propose some cleanup to make this a bit simpler using helpers that handle all these subtle details internally so that the call sites don't have to do this special handling.
On Wed, 6 Mar 2024 10:46:45 -0800 Mina Almasry wrote: > Gah, nevermind, skb_pp_frag_ref() actually returns -EINVAL if > !skb->pp_recycle, and in the call site we do a skb_frag_ref() on this > error, so all in all we end up doing a get_page/put_page pair. Sorry > for the noise. > > So we're supposed to: > - !skb->pp_recycle && is_pp_page() > ref via get_page > unref via put_page > > Very subtle stuff (for me at least). I'll try to propose some cleanup > to make this a bit simpler using helpers that handle all these subtle > details internally so that the call sites don't have to do this > special handling. Sure, although complexity is complexity, we can only do so much to hide it. For pp_recycle - the problem is when we added page pool pages, hardly anything in the upper layers of the stack was made pp aware. So we can end up with someone doing get_page(page); skb_fill_page_desc(skb, page); on a PP page. You probably inspected a lot of those cases for the ZC work, and they can be fixed up to do a "pp-aware get()", but until then we need skb->pp_recycle. pp_recycle kinda denotes whether whoever constructed the skb was PP aware. So, yes, definitely a good long term goal, but not in a fix :)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 4dd9e5040672..3e07d78c887d 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -112,9 +112,14 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp) /* Unref skb_frag_pages in the src scatterlist if necessary. * Skip the first sg which comes from skb->data. */ - if (req->src != req->dst) - for (sg = sg_next(req->src); sg; sg = sg_next(sg)) - put_page(sg_page(sg)); + if (req->src != req->dst) { + for (sg = sg_next(req->src); sg; sg = sg_next(sg)) { + struct page *page = sg_page(sg); + + if (!napi_pp_put_page(page, false)) + put_page(page); + } + } } #ifdef CONFIG_INET_ESPINTCP diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 6e6efe026cdc..b73f5773139d 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -129,9 +129,14 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp) /* Unref skb_frag_pages in the src scatterlist if necessary. * Skip the first sg which comes from skb->data. */ - if (req->src != req->dst) - for (sg = sg_next(req->src); sg; sg = sg_next(sg)) - put_page(sg_page(sg)); + if (req->src != req->dst) { + for (sg = sg_next(req->src); sg; sg = sg_next(sg)) { + struct page *page = sg_page(sg); + + if (!napi_pp_put_page(page, false)) + put_page(page); + } + } } #ifdef CONFIG_INET6_ESPINTCP
When the skb is reorganized during esp_output (!esp->inline), the pages coming from the original skb fragments are supposed to be released back to the system through put_page. But if the skb fragment pages are originating from a page_pool, calling put_page on them will trigger a page_pool leak which will eventually result in a crash. This leak can be easily observed when using CONFIG_DEBUG_VM and doing ipsec + gre (non offloaded) forwarding: BUG: Bad page state in process ksoftirqd/16 pfn:1451b6 page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6 flags: 0x200000000000000(node=0|zone=2) page_type: 0xffffffff() raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000 raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000 page dumped because: page_pool leak Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core] CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x36/0x50 bad_page+0x70/0xf0 free_unref_page_prepare+0x27a/0x460 free_unref_page+0x38/0x120 esp_ssg_unref.isra.0+0x15f/0x200 esp_output_tail+0x66d/0x780 esp_xmit+0x2c5/0x360 validate_xmit_xfrm+0x313/0x370 ? validate_xmit_skb+0x1d/0x330 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x23e/0x350 __dev_queue_xmit+0x337/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x25e/0x580 iptunnel_xmit+0x19b/0x240 ip_tunnel_xmit+0x5fb/0xb60 ipgre_xmit+0x14d/0x280 [ip_gre] dev_hard_start_xmit+0xc3/0x1c0 __dev_queue_xmit+0x208/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x1ca/0x580 ip_sublist_rcv_finish+0x32/0x40 ip_sublist_rcv+0x1b2/0x1f0 ? ip_rcv_finish_core.constprop.0+0x460/0x460 ip_list_rcv+0x103/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x1b3/0x2c0 napi_gro_receive+0xc8/0x200 gro_cell_poll+0x52/0x90 __napi_poll+0x25/0x1a0 net_rx_action+0x28e/0x300 __do_softirq+0xc3/0x276 ? sort_range+0x20/0x20 run_ksoftirqd+0x1e/0x30 smpboot_thread_fn+0xa6/0x130 kthread+0xcd/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x31/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> The suggested fix is to use the page_pool release API first and then fallback to put_page. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reported-by: Anatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz> Reported-by: Ian Kumlien <ian.kumlien@gmail.com> Change-Id: I263cf91c1d13c2736a58927e8e0fc51296759450 --- net/ipv4/esp4.c | 11 ++++++++--- net/ipv6/esp6.c | 11 ++++++++--- 2 files changed, 16 insertions(+), 6 deletions(-)