Message ID | 20250213052150.18392-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] page_pool: avoid infinite loop to schedule delayed worker | expand |
On 2/13/25 6:21 AM, Jason Xing wrote: > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 1c6fec08bc43..e1f89a19a6b6 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > int inflight; > > inflight = page_pool_release(pool); > - if (!inflight) > - return; > > /* Periodic warning for page pools the user can't see */ > netdev = READ_ONCE(pool->slow.netdev); This causes UaF, as catched by the CI: https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr at this point 'inflight' could be 0 and 'pool' already freed. /P
On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 2/13/25 6:21 AM, Jason Xing wrote: > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > index 1c6fec08bc43..e1f89a19a6b6 100644 > > --- a/net/core/page_pool.c > > +++ b/net/core/page_pool.c > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > > int inflight; > > > > inflight = page_pool_release(pool); > > - if (!inflight) > > - return; > > > > /* Periodic warning for page pools the user can't see */ > > netdev = READ_ONCE(pool->slow.netdev); > > This causes UaF, as catched by the CI: > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr > > at this point 'inflight' could be 0 and 'pool' already freed. Oh, right, thanks for catching that. I'm going to use the previous approach (one-liner with a few comments): diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 1c6fec08bc43..209b5028abd7 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct work_struct *wq) int inflight; inflight = page_pool_release(pool); - if (!inflight) + /* In rare cases, a driver bug may cause inflight to go negative. + * Don't reschedule release if inflight is 0 or negative. + * - If 0, the page_pool has been destroyed + * - if negative, we will never recover + * in both cases no reschedule is necessary. + */ + if (inflight <= 0) return; Thanks, Jason
On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 2/13/25 6:21 AM, Jason Xing wrote: > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > index 1c6fec08bc43..e1f89a19a6b6 100644 > > > --- a/net/core/page_pool.c > > > +++ b/net/core/page_pool.c > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > > > int inflight; > > > > > > inflight = page_pool_release(pool); > > > - if (!inflight) > > > - return; > > > > > > /* Periodic warning for page pools the user can't see */ > > > netdev = READ_ONCE(pool->slow.netdev); > > > > This causes UaF, as catched by the CI: > > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr > > > > at this point 'inflight' could be 0 and 'pool' already freed. > > Oh, right, thanks for catching that. > > I'm going to use the previous approach (one-liner with a few comments): > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 1c6fec08bc43..209b5028abd7 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct > work_struct *wq) > int inflight; > > inflight = page_pool_release(pool); > - if (!inflight) > + /* In rare cases, a driver bug may cause inflight to go negative. > + * Don't reschedule release if inflight is 0 or negative. > + * - If 0, the page_pool has been destroyed > + * - if negative, we will never recover > + * in both cases no reschedule is necessary. > + */ > + if (inflight <= 0) > return; > I think it could still be good to have us warn once so that this bug is not silent. We can return early if page_pool_release(pool) == 0, and then only schedule_delayed_work() after the warning if inflight is positive.
On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote: > > On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On 2/13/25 6:21 AM, Jason Xing wrote: > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > > index 1c6fec08bc43..e1f89a19a6b6 100644 > > > > --- a/net/core/page_pool.c > > > > +++ b/net/core/page_pool.c > > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > > > > int inflight; > > > > > > > > inflight = page_pool_release(pool); > > > > - if (!inflight) > > > > - return; > > > > > > > > /* Periodic warning for page pools the user can't see */ > > > > netdev = READ_ONCE(pool->slow.netdev); > > > > > > This causes UaF, as catched by the CI: > > > > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr > > > > > > at this point 'inflight' could be 0 and 'pool' already freed. > > > > Oh, right, thanks for catching that. > > > > I'm going to use the previous approach (one-liner with a few comments): > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > index 1c6fec08bc43..209b5028abd7 100644 > > --- a/net/core/page_pool.c > > +++ b/net/core/page_pool.c > > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct > > work_struct *wq) > > int inflight; > > > > inflight = page_pool_release(pool); > > - if (!inflight) > > + /* In rare cases, a driver bug may cause inflight to go negative. > > + * Don't reschedule release if inflight is 0 or negative. > > + * - If 0, the page_pool has been destroyed > > + * - if negative, we will never recover > > + * in both cases no reschedule is necessary. > > + */ > > + if (inflight <= 0) > > return; > > > > I think it could still be good to have us warn once so that this bug > is not silent. Allow me to double-check what you meant here. Applying the above patch, we do at least see the warning once in page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN() before stopping the reschedule. Do you expect to see another warning, namely, pr_warn() in page_pool_release_retry()? If so, I assume you expect to only print out the pool->user.id? > > We can return early if page_pool_release(pool) == 0, and then only > schedule_delayed_work() after the warning if inflight is positive. Based on the above analysis, can we adjust in this way: diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 1c6fec08bc43..e1831cc23d9c 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -625,8 +625,8 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict) if (strict) { trace_page_pool_release(pool, inflight, hold_cnt, release_cnt); - WARN(inflight < 0, "Negative(%d) inflight packet-pages", - inflight); + WARN(inflight < 0, "Pool id(%u): negative(%d) inflight packet-pages", + pool->user.id, inflight); } else { inflight = max(0, inflight); } @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct work_struct *wq) int inflight; inflight = page_pool_release(pool); - if (!inflight) + /* In rare cases, a driver bug may cause inflight to go negative. + * Don't reschedule release if inflight is 0 or negative. + * - If 0, the page_pool has been destroyed + * - if negative, we will never recover + * in both cases no reschedule is necessary. + */ + if (inflight <= 0) return; Thanks, Jason
On Thu, Feb 13, 2025 at 3:43 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote: > > > > On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On 2/13/25 6:21 AM, Jason Xing wrote: > > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > > > index 1c6fec08bc43..e1f89a19a6b6 100644 > > > > > --- a/net/core/page_pool.c > > > > > +++ b/net/core/page_pool.c > > > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > > > > > int inflight; > > > > > > > > > > inflight = page_pool_release(pool); > > > > > - if (!inflight) > > > > > - return; > > > > > > > > > > /* Periodic warning for page pools the user can't see */ > > > > > netdev = READ_ONCE(pool->slow.netdev); > > > > > > > > This causes UaF, as catched by the CI: > > > > > > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr > > > > > > > > at this point 'inflight' could be 0 and 'pool' already freed. > > > > > > Oh, right, thanks for catching that. > > > > > > I'm going to use the previous approach (one-liner with a few comments): > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > index 1c6fec08bc43..209b5028abd7 100644 > > > --- a/net/core/page_pool.c > > > +++ b/net/core/page_pool.c > > > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct > > > work_struct *wq) > > > int inflight; > > > > > > inflight = page_pool_release(pool); > > > - if (!inflight) > > > + /* In rare cases, a driver bug may cause inflight to go negative. > > > + * Don't reschedule release if inflight is 0 or negative. > > > + * - If 0, the page_pool has been destroyed > > > + * - if negative, we will never recover > > > + * in both cases no reschedule is necessary. > > > + */ > > > + if (inflight <= 0) > > > return; > > > > > > > I think it could still be good to have us warn once so that this bug > > is not silent. > > Allow me to double-check what you meant here. Applying the above > patch, we do at least see the warning once in > page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN() > before stopping the reschedule. > Ah, I see. I had missed indeed that we warn in page_pool_inflight() if we see anything negative there anyway. Nevermind then. Thanks!
On Fri, Feb 14, 2025 at 12:10 PM Mina Almasry <almasrymina@google.com> wrote: > > On Thu, Feb 13, 2025 at 3:43 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Fri, Feb 14, 2025 at 4:14 AM Mina Almasry <almasrymina@google.com> wrote: > > > > > > On Thu, Feb 13, 2025 at 2:49 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Thu, Feb 13, 2025 at 4:32 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > On 2/13/25 6:21 AM, Jason Xing wrote: > > > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > > > > index 1c6fec08bc43..e1f89a19a6b6 100644 > > > > > > --- a/net/core/page_pool.c > > > > > > +++ b/net/core/page_pool.c > > > > > > @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) > > > > > > int inflight; > > > > > > > > > > > > inflight = page_pool_release(pool); > > > > > > - if (!inflight) > > > > > > - return; > > > > > > > > > > > > /* Periodic warning for page pools the user can't see */ > > > > > > netdev = READ_ONCE(pool->slow.netdev); > > > > > > > > > > This causes UaF, as catched by the CI: > > > > > > > > > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/990441/34-udpgro-bench-sh/stderr > > > > > > > > > > at this point 'inflight' could be 0 and 'pool' already freed. > > > > > > > > Oh, right, thanks for catching that. > > > > > > > > I'm going to use the previous approach (one-liner with a few comments): > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > > > index 1c6fec08bc43..209b5028abd7 100644 > > > > --- a/net/core/page_pool.c > > > > +++ b/net/core/page_pool.c > > > > @@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct > > > > work_struct *wq) > > > > int inflight; > > > > > > > > inflight = page_pool_release(pool); > > > > - if (!inflight) > > > > + /* In rare cases, a driver bug may cause inflight to go negative. > > > > + * Don't reschedule release if inflight is 0 or negative. > > > > + * - If 0, the page_pool has been destroyed > > > > + * - if negative, we will never recover > > > > + * in both cases no reschedule is necessary. > > > > + */ > > > > + if (inflight <= 0) > > > > return; > > > > > > > > > > I think it could still be good to have us warn once so that this bug > > > is not silent. > > > > Allow me to double-check what you meant here. Applying the above > > patch, we do at least see the warning once in > > page_pool_release_retry()->page_pool_release()->page_pool_inflight()->WARN() > > before stopping the reschedule. > > > > Ah, I see. I had missed indeed that we warn in page_pool_inflight() if > we see anything negative there anyway. Nevermind then. Thanks! Okay, then a one-liner patch is enough :) Thanks for the review.
diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 1c6fec08bc43..e1f89a19a6b6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq) int inflight; inflight = page_pool_release(pool); - if (!inflight) - return; /* Periodic warning for page pools the user can't see */ netdev = READ_ONCE(pool->slow.netdev); if (time_after_eq(jiffies, pool->defer_warn) && - (!netdev || netdev == NET_PTR_POISON)) { + (!netdev || netdev == NET_PTR_POISON) && + inflight != 0) { int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ; pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n", @@ -1126,7 +1125,15 @@ static void page_pool_release_retry(struct work_struct *wq) pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; } - /* Still not ready to be disconnected, retry later */ + /* In rare cases, a driver bug may cause inflight to go negative. + * Don't reschedule release if inflight is 0 or negative. + * - If 0, the page_pool has been destroyed + * - if negative, we will never recover + * in both cases no reschedule is necessary. + */ + if (inflight <= 0) + return; + schedule_delayed_work(&pool->release_dw, DEFER_TIME); }
We noticed the kworker in page_pool_release_retry() was waken up repeatedly and infinitely in production because of the buggy driver causing the inflight less than 0 and warning us in page_pool_inflight()[1]. Since the inflight value goes negative, it means we should not expect the whole page_pool to get back to work normally. This patch mitigates the adverse effect by not rescheduling the kworker when detecting the inflight negative in page_pool_release_retry(). This patch allows to pr_warn() once in page_pool_release_retry() in this case. [1] [Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------ [Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages ... [Mon Feb 10 20:36:11 2025] Call Trace: [Mon Feb 10 20:36:11 2025] page_pool_release_retry+0x23/0x70 [Mon Feb 10 20:36:11 2025] process_one_work+0x1b1/0x370 [Mon Feb 10 20:36:11 2025] worker_thread+0x37/0x3a0 [Mon Feb 10 20:36:11 2025] kthread+0x11a/0x140 [Mon Feb 10 20:36:11 2025] ? process_one_work+0x370/0x370 [Mon Feb 10 20:36:11 2025] ? __kthread_cancel_work+0x40/0x40 [Mon Feb 10 20:36:11 2025] ret_from_fork+0x35/0x40 [Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]--- Note: before this patch, the above calltrace would flood the dmesg due to repeated reschedule of release_dw kworker. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> --- v2 Link: https://lore.kernel.org/all/20250210130953.26831-1-kerneljasonxing@gmail.com/ 1. add more details in commit message. 2. allow printing once when the inflight is negative. 3. correct the position where to stop the reschedule. Suggested by Mina and Jakub. --- net/core/page_pool.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)