diff mbox series

kfence: use TASK_IDLE when awaiting allocation

Message ID 20210521083209.3740269-1-elver@google.com (mailing list archive)
State New, archived
Headers show
Series kfence: use TASK_IDLE when awaiting allocation | expand

Commit Message

Marco Elver May 21, 2021, 8:32 a.m. UTC
Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
allocation counts towards load. However, for KFENCE, this does not make
any sense, since there is no busy work we're awaiting.

Instead, use TASK_IDLE via wait_event_idle() to not count towards load.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
Fixes: 407f1d8c1b5f ("kfence: await for allocation using wait_event")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> # v5.12+
---
 mm/kfence/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Hillf Danton May 21, 2021, 9:37 a.m. UTC | #1
On Fri, 21 May 2021 10:32:09 +0200 Marco Elver wrote:
>Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
>allocation counts towards load. However, for KFENCE, this does not make
>any sense, since there is no busy work we're awaiting.

Because of a blocking wq callback, kfence_timer should be queued on a
unbound workqueue in the first place. Feel free to add a followup to
replace system_power_efficient_wq with system_unbound_wq if it makes
sense to you that kfence behaves as correctly as expected independent of
CONFIG_WQ_POWER_EFFICIENT_DEFAULT given "system_power_efficient_wq is
identical to system_wq if 'wq_power_efficient' is disabled."
David Laight May 21, 2021, 9:39 a.m. UTC | #2
From: Marco Elver
> Sent: 21 May 2021 09:32
> 
> Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
> allocation counts towards load. However, for KFENCE, this does not make
> any sense, since there is no busy work we're awaiting.
> 
> Instead, use TASK_IDLE via wait_event_idle() to not count towards load.

Doesn't that let the process be interruptible by a signal.
Which is probably not desirable.

There really ought to be a way of sleeping with TASK_UNINTERRUPTIBLE
without changing the load-average.

IIRC the load-average is really intended to include processes
that are waiting for disk - especially for swap.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Marco Elver May 21, 2021, 9:47 a.m. UTC | #3
On Fri, May 21, 2021 at 09:39AM +0000, David Laight wrote:
> From: Marco Elver
> > Sent: 21 May 2021 09:32
> > 
> > Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
> > allocation counts towards load. However, for KFENCE, this does not make
> > any sense, since there is no busy work we're awaiting.
> > 
> > Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
> 
> Doesn't that let the process be interruptible by a signal.
> Which is probably not desirable.
> 
> There really ought to be a way of sleeping with TASK_UNINTERRUPTIBLE
> without changing the load-average.

That's what TASK_IDLE is:

	include/linux/sched.h:#define TASK_IDLE                 (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

See https://lore.kernel.org/lkml/alpine.LFD.2.11.1505112154420.1749@ja.home.ssi.bg/T/

Thanks,
-- Marco

> IIRC the load-average is really intended to include processes
> that are waiting for disk - especially for swap.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
David Laight May 21, 2021, 10:15 a.m. UTC | #4
From: Marco Elver
> Sent: 21 May 2021 10:48
> 
> On Fri, May 21, 2021 at 09:39AM +0000, David Laight wrote:
> > From: Marco Elver
> > > Sent: 21 May 2021 09:32
> > >
> > > Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
> > > allocation counts towards load. However, for KFENCE, this does not make
> > > any sense, since there is no busy work we're awaiting.
> > >
> > > Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
> >
> > Doesn't that let the process be interruptible by a signal.
> > Which is probably not desirable.
> >
> > There really ought to be a way of sleeping with TASK_UNINTERRUPTIBLE
> > without changing the load-average.
> 
> That's what TASK_IDLE is:
> 
> 	include/linux/sched.h:#define TASK_IDLE                 (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

That's been added since I last tried to stop tasks updating
the load-average :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Marco Elver May 21, 2021, 11:17 a.m. UTC | #5
On Fri, 21 May 2021 at 11:37, Hillf Danton <hdanton@sina.com> wrote:
> On Fri, 21 May 2021 10:32:09 +0200 Marco Elver wrote:
> >Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
> >allocation counts towards load. However, for KFENCE, this does not make
> >any sense, since there is no busy work we're awaiting.
>
> Because of a blocking wq callback, kfence_timer should be queued on a
> unbound workqueue in the first place. Feel free to add a followup to
> replace system_power_efficient_wq with system_unbound_wq if it makes
> sense to you that kfence behaves as correctly as expected independent of
> CONFIG_WQ_POWER_EFFICIENT_DEFAULT given "system_power_efficient_wq is
> identical to system_wq if 'wq_power_efficient' is disabled."

Thanks for pointing it out -- I think this makes sense, let's just use
the unbound wq unconditionally. Since it's independent of this patch,
I've sent it separately:
https://lkml.kernel.org/r/20210521111630.472579-1-elver@google.com
diff mbox series

Patch

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index e18fbbd5d9b4..4d21ac44d5d3 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -627,10 +627,10 @@  static void toggle_allocation_gate(struct work_struct *work)
 		 * During low activity with no allocations we might wait a
 		 * while; let's avoid the hung task warning.
 		 */
-		wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate),
-				   sysctl_hung_task_timeout_secs * HZ / 2);
+		wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate),
+					sysctl_hung_task_timeout_secs * HZ / 2);
 	} else {
-		wait_event(allocation_wait, atomic_read(&kfence_allocation_gate));
+		wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate));
 	}
 
 	/* Disable static key and reset timer. */