diff mbox series

[v2,6/6] workqueue, kasan: avoid alloc_pages() when recording stack

Message ID 20210913112609.2651084-7-elver@google.com (mailing list archive)
State New
Headers show
Series stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock | expand

Commit Message

Marco Elver Sept. 13, 2021, 11:26 a.m. UTC
Shuah Khan reported:

 | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
 | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
 | it tries to allocate memory attempting to acquire spinlock in page
 | allocation code while holding workqueue pool raw_spinlock.
 |
 | There are several instances of this problem when block layer tries
 | to __queue_work(). Call trace from one of these instances is below:
 |
 |     kblockd_mod_delayed_work_on()
 |       mod_delayed_work_on()
 |         __queue_delayed_work()
 |           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
 |             insert_work()
 |               kasan_record_aux_stack()
 |                 kasan_save_stack()
 |                   stack_depot_save()
 |                     alloc_pages()
 |                       __alloc_pages()
 |                         get_page_from_freelist()
 |                           rm_queue()
 |                             rm_queue_pcplist()
 |                               local_lock_irqsave(&pagesets.lock, flags);
 |                               [ BUG: Invalid wait context triggered ]

The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and ab00db216c9c7).

Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc().

While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit). For frequent calls from the same
location, it therefore becomes extremely unlikely that
kasan_record_aux_stack_noalloc() fails.

Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Reported-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tejun Heo Sept. 13, 2021, 5:03 p.m. UTC | #1
On Mon, Sep 13, 2021 at 01:26:09PM +0200, Marco Elver wrote:
> While there is an increased risk of failing to insert the stack trace,
> this is typically unlikely, especially if the same insertion had already
> succeeded previously (stack depot hit). For frequent calls from the same
> location, it therefore becomes extremely unlikely that
> kasan_record_aux_stack_noalloc() fails.
> 
> Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
> Reported-by: Shuah Khan <skhan@linuxfoundation.org>
> Signed-off-by: Marco Elver <elver@google.com>
> Tested-by: Shuah Khan <skhan@linuxfoundation.org>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Tejun Heo <tj@kernel.org>

Please feel free to route with the rest of series or if you want me to take
these through the wq tree, please let me know.

Thanks.
Marco Elver Sept. 13, 2021, 5:58 p.m. UTC | #2
On Mon, 13 Sept 2021 at 19:03, Tejun Heo <tj@kernel.org> wrote:
>
> On Mon, Sep 13, 2021 at 01:26:09PM +0200, Marco Elver wrote:
> > While there is an increased risk of failing to insert the stack trace,
> > this is typically unlikely, especially if the same insertion had already
> > succeeded previously (stack depot hit). For frequent calls from the same
> > location, it therefore becomes extremely unlikely that
> > kasan_record_aux_stack_noalloc() fails.
> >
> > Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
> > Reported-by: Shuah Khan <skhan@linuxfoundation.org>
> > Signed-off-by: Marco Elver <elver@google.com>
> > Tested-by: Shuah Khan <skhan@linuxfoundation.org>
> > Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

> Please feel free to route with the rest of series or if you want me to take
> these through the wq tree, please let me know.

Usually KASAN & stackdepot patches go via the -mm tree. I hope the
1-line change to workqueue won't conflict with other changes pending
in the wq tree. Unless you or Andrew tells us otherwise, I assume
these will at some point appear in -mm.

Thanks,
-- Marco

> Thanks.
>
> --
> tejun
Tejun Heo Sept. 13, 2021, 6:02 p.m. UTC | #3
On Mon, Sep 13, 2021 at 07:58:39PM +0200, Marco Elver wrote:
> > Please feel free to route with the rest of series or if you want me to take
> > these through the wq tree, please let me know.
> 
> Usually KASAN & stackdepot patches go via the -mm tree. I hope the
> 1-line change to workqueue won't conflict with other changes pending
> in the wq tree. Unless you or Andrew tells us otherwise, I assume
> these will at some point appear in -mm.

That part is really unlikely to cause conflicts and -mm sits on top of all
other trees anyway, so it should be completely fine.

Thanks.
diff mbox series

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33a6b4a2443d..9a042a449002 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1350,7 +1350,7 @@  static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 	struct worker_pool *pool = pwq->pool;
 
 	/* record the work call stack in order to print it in KASAN reports */
-	kasan_record_aux_stack(work);
+	kasan_record_aux_stack_noalloc(work);
 
 	/* we own @work, set data and link */
 	set_work_pwq(work, pwq, extra_flags);