diff mbox series

[v2,net-next] page_pool: avoid infinite loop to schedule delayed worker

Message ID 20250213052150.18392-1-kerneljasonxing@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v2,net-next] page_pool: avoid infinite loop to schedule delayed worker | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: The commit message has 'Call Trace:', perhaps it also needs a 'Fixes:' tag?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-02-13--06-00 (tests: 886)

Commit Message

Jason Xing Feb. 13, 2025, 5:21 a.m. UTC
We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].

Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.

This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().

This patch allows to pr_warn() once in page_pool_release_retry()
in this case.

[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025]  page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025]  process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025]  worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025]  kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025]  ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025]  ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025]  ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
v2
Link: https://lore.kernel.org/all/20250210130953.26831-1-kerneljasonxing@gmail.com/
1. add more details in commit message.
2. allow printing once when the inflight is negative.
3. correct the position where to stop the reschedule.
Suggested by Mina and Jakub.
---
 net/core/page_pool.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Paolo Abeni Feb. 13, 2025, 8:32 a.m. UTC | #1
On 2/13/25 6:21 AM, Jason Xing wrote:
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1c6fec08bc43..e1f89a19a6b6 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
>  	int inflight;
>  
>  	inflight = page_pool_release(pool);
> -	if (!inflight)
> -		return;
>  
>  	/* Periodic warning for page pools the user can't see */
>  	netdev = READ_ONCE(pool->slow.netdev);

This causes UaF, as catched by the CI:

https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr

at this point 'inflight' could be 0 and 'pool' already freed.

/P
Jason Xing Feb. 13, 2025, 10:48 a.m. UTC | #2
On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 2/13/25 6:21 AM, Jason Xing wrote:
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 1c6fec08bc43..e1f89a19a6b6 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
> >       int inflight;
> >
> >       inflight = page_pool_release(pool);
> > -     if (!inflight)
> > -             return;
> >
> >       /* Periodic warning for page pools the user can't see */
> >       netdev = READ_ONCE(pool->slow.netdev);
>
> This causes UaF, as catched by the CI:
>
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr
>
> at this point 'inflight' could be 0 and 'pool' already freed.

Oh, right, thanks for catching that.

I'm going to use the previous approach (one-liner with a few comments):
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1c6fec08bc43..209b5028abd7 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
work_struct *wq)
        int inflight;

        inflight = page_pool_release(pool);
-       if (!inflight)
+       /* In rare cases, a driver bug may cause inflight to go negative.
+        * Don't reschedule release if inflight is 0 or negative.
+        * - If 0, the page_pool has been destroyed
+        * - if negative, we will never recover
+        *   in both cases no reschedule is necessary.
+        */
+       if (inflight <= 0)
                return;

Thanks,
Jason
Mina Almasry Feb. 13, 2025, 8:14 p.m. UTC | #3
On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 2/13/25 6:21 AM, Jason Xing wrote:
> > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > index 1c6fec08bc43..e1f89a19a6b6 100644
> > > --- a/net/core/page_pool.c
> > > +++ b/net/core/page_pool.c
> > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
> > >       int inflight;
> > >
> > >       inflight = page_pool_release(pool);
> > > -     if (!inflight)
> > > -             return;
> > >
> > >       /* Periodic warning for page pools the user can't see */
> > >       netdev = READ_ONCE(pool->slow.netdev);
> >
> > This causes UaF, as catched by the CI:
> >
> > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr
> >
> > at this point 'inflight' could be 0 and 'pool' already freed.
>
> Oh, right, thanks for catching that.
>
> I'm going to use the previous approach (one-liner with a few comments):
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1c6fec08bc43..209b5028abd7 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
> work_struct *wq)
>         int inflight;
>
>         inflight = page_pool_release(pool);
> -       if (!inflight)
> +       /* In rare cases, a driver bug may cause inflight to go negative.
> +        * Don't reschedule release if inflight is 0 or negative.
> +        * - If 0, the page_pool has been destroyed
> +        * - if negative, we will never recover
> +        *   in both cases no reschedule is necessary.
> +        */
> +       if (inflight <= 0)
>                 return;
>

I think it could still be good to have us warn once so that this bug
is not silent.

We can return early if page_pool_release(pool) == 0, and then only
schedule_delayed_work() after the warning if inflight is positive.
Jason Xing Feb. 13, 2025, 11:42 p.m. UTC | #4
On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote:
>
> On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On 2/13/25 6:21 AM, Jason Xing wrote:
> > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > index 1c6fec08bc43..e1f89a19a6b6 100644
> > > > --- a/net/core/page_pool.c
> > > > +++ b/net/core/page_pool.c
> > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
> > > >       int inflight;
> > > >
> > > >       inflight = page_pool_release(pool);
> > > > -     if (!inflight)
> > > > -             return;
> > > >
> > > >       /* Periodic warning for page pools the user can't see */
> > > >       netdev = READ_ONCE(pool->slow.netdev);
> > >
> > > This causes UaF, as catched by the CI:
> > >
> > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr
> > >
> > > at this point 'inflight' could be 0 and 'pool' already freed.
> >
> > Oh, right, thanks for catching that.
> >
> > I'm going to use the previous approach (one-liner with a few comments):
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 1c6fec08bc43..209b5028abd7 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
> > work_struct *wq)
> >         int inflight;
> >
> >         inflight = page_pool_release(pool);
> > -       if (!inflight)
> > +       /* In rare cases, a driver bug may cause inflight to go negative.
> > +        * Don't reschedule release if inflight is 0 or negative.
> > +        * - If 0, the page_pool has been destroyed
> > +        * - if negative, we will never recover
> > +        *   in both cases no reschedule is necessary.
> > +        */
> > +       if (inflight <= 0)
> >                 return;
> >
>
> I think it could still be good to have us warn once so that this bug
> is not silent.

Allow me to double-check what you meant here. Applying the above
patch, we do at least see the warning once in
page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN()
before stopping the reschedule.

Do you expect to see another warning, namely, pr_warn() in
page_pool_release_retry()? If so, I assume you expect to only print
out the pool->user.id?

>
> We can return early if page_pool_release(pool) == 0, and then only
> schedule_delayed_work() after the warning if inflight is positive.

Based on the above analysis, can we adjust in this way:
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1c6fec08bc43..e1831cc23d9c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -625,8 +625,8 @@ s32 page_pool_inflight(const struct page_pool
*pool, bool strict)

        if (strict) {
                trace_page_pool_release(pool, inflight, hold_cnt, release_cnt);
-               WARN(inflight < 0, "Negative(%d) inflight packet-pages",
-                    inflight);
+               WARN(inflight < 0, "Pool id(%u): negative(%d) inflight
packet-pages",
+                    pool->user.id, inflight);
        } else {
                inflight = max(0, inflight);
        }
@@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
work_struct *wq)
        int inflight;

        inflight = page_pool_release(pool);
-       if (!inflight)
+       /* In rare cases, a driver bug may cause inflight to go negative.
+        * Don't reschedule release if inflight is 0 or negative.
+        * - If 0, the page_pool has been destroyed
+        * - if negative, we will never recover
+        *   in both cases no reschedule is necessary.
+        */
+       if (inflight <= 0)
                return;

Thanks,
Jason
Mina Almasry Feb. 14, 2025, 4:09 a.m. UTC | #5
On Thu, Feb 13, 2025 at 3:43 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > >
> > > > On 2/13/25 6:21 AM, Jason Xing wrote:
> > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > > index 1c6fec08bc43..e1f89a19a6b6 100644
> > > > > --- a/net/core/page_pool.c
> > > > > +++ b/net/core/page_pool.c
> > > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
> > > > >       int inflight;
> > > > >
> > > > >       inflight = page_pool_release(pool);
> > > > > -     if (!inflight)
> > > > > -             return;
> > > > >
> > > > >       /* Periodic warning for page pools the user can't see */
> > > > >       netdev = READ_ONCE(pool->slow.netdev);
> > > >
> > > > This causes UaF, as catched by the CI:
> > > >
> > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr
> > > >
> > > > at this point 'inflight' could be 0 and 'pool' already freed.
> > >
> > > Oh, right, thanks for catching that.
> > >
> > > I'm going to use the previous approach (one-liner with a few comments):
> > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > index 1c6fec08bc43..209b5028abd7 100644
> > > --- a/net/core/page_pool.c
> > > +++ b/net/core/page_pool.c
> > > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
> > > work_struct *wq)
> > >         int inflight;
> > >
> > >         inflight = page_pool_release(pool);
> > > -       if (!inflight)
> > > +       /* In rare cases, a driver bug may cause inflight to go negative.
> > > +        * Don't reschedule release if inflight is 0 or negative.
> > > +        * - If 0, the page_pool has been destroyed
> > > +        * - if negative, we will never recover
> > > +        *   in both cases no reschedule is necessary.
> > > +        */
> > > +       if (inflight <= 0)
> > >                 return;
> > >
> >
> > I think it could still be good to have us warn once so that this bug
> > is not silent.
>
> Allow me to double-check what you meant here. Applying the above
> patch, we do at least see the warning once in
> page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN()
> before stopping the reschedule.
>

Ah, I see. I had missed indeed that we warn in page_pool_inflight() if
we see anything negative there anyway. Nevermind then. Thanks!
Jason Xing Feb. 14, 2025, 6:13 a.m. UTC | #6
On Fri, Feb 14, 2025 at 12:10 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Thu, Feb 13, 2025 at 3:43 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote:
> > >
> > > On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > >
> > > > > On 2/13/25 6:21 AM, Jason Xing wrote:
> > > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > > > index 1c6fec08bc43..e1f89a19a6b6 100644
> > > > > > --- a/net/core/page_pool.c
> > > > > > +++ b/net/core/page_pool.c
> > > > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
> > > > > >       int inflight;
> > > > > >
> > > > > >       inflight = page_pool_release(pool);
> > > > > > -     if (!inflight)
> > > > > > -             return;
> > > > > >
> > > > > >       /* Periodic warning for page pools the user can't see */
> > > > > >       netdev = READ_ONCE(pool->slow.netdev);
> > > > >
> > > > > This causes UaF, as catched by the CI:
> > > > >
> > > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr
> > > > >
> > > > > at this point 'inflight' could be 0 and 'pool' already freed.
> > > >
> > > > Oh, right, thanks for catching that.
> > > >
> > > > I'm going to use the previous approach (one-liner with a few comments):
> > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > index 1c6fec08bc43..209b5028abd7 100644
> > > > --- a/net/core/page_pool.c
> > > > +++ b/net/core/page_pool.c
> > > > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct
> > > > work_struct *wq)
> > > >         int inflight;
> > > >
> > > >         inflight = page_pool_release(pool);
> > > > -       if (!inflight)
> > > > +       /* In rare cases, a driver bug may cause inflight to go negative.
> > > > +        * Don't reschedule release if inflight is 0 or negative.
> > > > +        * - If 0, the page_pool has been destroyed
> > > > +        * - if negative, we will never recover
> > > > +        *   in both cases no reschedule is necessary.
> > > > +        */
> > > > +       if (inflight <= 0)
> > > >                 return;
> > > >
> > >
> > > I think it could still be good to have us warn once so that this bug
> > > is not silent.
> >
> > Allow me to double-check what you meant here. Applying the above
> > patch, we do at least see the warning once in
> > page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN()
> > before stopping the reschedule.
> >
>
> Ah, I see. I had missed indeed that we warn in page_pool_inflight() if
> we see anything negative there anyway. Nevermind then. Thanks!

Okay, then a one-liner patch is enough :)

Thanks for the review.
diff mbox series

Patch

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1c6fec08bc43..e1f89a19a6b6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1112,13 +1112,12 @@  static void page_pool_release_retry(struct work_struct *wq)
 	int inflight;
 
 	inflight = page_pool_release(pool);
-	if (!inflight)
-		return;
 
 	/* Periodic warning for page pools the user can't see */
 	netdev = READ_ONCE(pool->slow.netdev);
 	if (time_after_eq(jiffies, pool->defer_warn) &&
-	    (!netdev || netdev == NET_PTR_POISON)) {
+	    (!netdev || netdev == NET_PTR_POISON) &&
+	    inflight != 0) {
 		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
 
 		pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
@@ -1126,7 +1125,15 @@  static void page_pool_release_retry(struct work_struct *wq)
 		pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
 	}
 
-	/* Still not ready to be disconnected, retry later */
+	/* In rare cases, a driver bug may cause inflight to go negative.
+	 * Don't reschedule release if inflight is 0 or negative.
+	 * - If 0, the page_pool has been destroyed
+	 * - if negative, we will never recover
+	 *   in both cases no reschedule is necessary.
+	 */
+	if (inflight <= 0)
+		return;
+
 	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
 }