diff mbox series

[2/2] rcu: Dump vmalloc memory info safely

Message ID 20230830110402.386898-2-joel@joelfernandes.org (mailing list archive)
State New
Headers show
Series [1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug | expand

Commit Message

Joel Fernandes Aug. 30, 2023, 11:04 a.m. UTC
From: Zqiang <qiang.zhang1211@gmail.com>

Currently, for double invoke call_rcu(), will dump rcu_head objects
memory info, if the objects is not allocated from the slab allocator,
the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
need to be held, since the call_rcu() can be invoked in interrupt context,
therefore, there is a possibility of spinlock deadlock scenarios.

And in Preempt-RT kernel, the rcutorture test also trigger the following
lockdep warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
3 locks held by swapper/0/1:
 #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
irq event stamp: 565512
hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
Preemption disabled at:
[<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xb0
 dump_stack+0x14/0x20
 __might_resched+0x1aa/0x280
 ? __pfx_rcu_torture_err_cb+0x10/0x10
 rt_spin_lock+0x53/0x130
 ? find_vmap_area+0x1f/0x70
 find_vmap_area+0x1f/0x70
 vmalloc_dump_obj+0x20/0x60
 mem_dump_obj+0x22/0x90
 __call_rcu_common+0x5bf/0x940
 ? debug_smp_processor_id+0x1b/0x30
 call_rcu_hurry+0x14/0x20
 rcu_torture_init+0x1f82/0x2370
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_init+0x10/0x10
 do_one_initcall+0x6c/0x300
 ? debug_smp_processor_id+0x1b/0x30
 kernel_init_freeable+0x2b9/0x540
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x1f/0x150
 ret_from_fork+0x40/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

The previous patch fixes this by using the deadlock-safe best-effort
version of find_vm_area. However, in case of failure print the fact that
the pointer was a vmalloc pointer so that we print at least something.

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Aug. 30, 2023, 12:08 p.m. UTC | #1
On Wed, Aug 30, 2023 at 11:04:00AM +0000, Joel Fernandes (Google) wrote:
> From: Zqiang <qiang.zhang1211@gmail.com>
> 
> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
> 
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:
> 
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> RCU nest depth: 1, expected: 1
> 3 locks held by swapper/0/1:
>  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> irq event stamp: 565512
> hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> Preemption disabled at:
> [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x68/0xb0
>  dump_stack+0x14/0x20
>  __might_resched+0x1aa/0x280
>  ? __pfx_rcu_torture_err_cb+0x10/0x10
>  rt_spin_lock+0x53/0x130
>  ? find_vmap_area+0x1f/0x70
>  find_vmap_area+0x1f/0x70
>  vmalloc_dump_obj+0x20/0x60
>  mem_dump_obj+0x22/0x90
>  __call_rcu_common+0x5bf/0x940
>  ? debug_smp_processor_id+0x1b/0x30
>  call_rcu_hurry+0x14/0x20
>  rcu_torture_init+0x1f82/0x2370
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_init+0x10/0x10
>  do_one_initcall+0x6c/0x300
>  ? debug_smp_processor_id+0x1b/0x30
>  kernel_init_freeable+0x2b9/0x540
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x1f/0x150
>  ret_from_fork+0x40/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> The previous patch fixes this by using the deadlock-safe best-effort
> version of find_vm_area. However, in case of failure print the fact that
> the pointer was a vmalloc pointer so that we print at least something.
> 
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Andrew Morton Sept. 3, 2023, 1:28 a.m. UTC | #2
On Wed, 30 Aug 2023 11:04:00 +0000 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
> 
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:

"possibility of deadlock" sounds like something -stable kernels would
like to have fixed.

Did you consider the desirability of a -stable backport?

If so, are we able to identify a suitable Fixes: target?
Joel Fernandes Sept. 3, 2023, 2:03 a.m. UTC | #3
On Sat, Sep 2, 2023 at 9:28 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 30 Aug 2023 11:04:00 +0000 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
>
> > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > memory info, if the objects is not allocated from the slab allocator,
> > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > need to be held, since the call_rcu() can be invoked in interrupt context,
> > therefore, there is a possibility of spinlock deadlock scenarios.
> >
> > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > lockdep warning:
>
> "possibility of deadlock" sounds like something -stable kernels would
> like to have fixed.
>
> Did you consider the desirability of a -stable backport?
>
> If so, are we able to identify a suitable Fixes: target?

Good point, it should be:
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")

I am currently reworking the patch as Vlad was also concerned about
(the existing) issue of accessing vm_struct fields without holding the
lock [1].

I will add this fixes tag to both patches for the v3 on the respin.

Thanks!

 - Joel
Joel Fernandes Sept. 3, 2023, 2:05 a.m. UTC | #4
On Sat, Sep 2, 2023 at 10:03 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Sat, Sep 2, 2023 at 9:28 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 30 Aug 2023 11:04:00 +0000 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> >
> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > > memory info, if the objects is not allocated from the slab allocator,
> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> > > therefore, there is a possibility of spinlock deadlock scenarios.
> > >
> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > > lockdep warning:
> >
> > "possibility of deadlock" sounds like something -stable kernels would
> > like to have fixed.
> >
> > Did you consider the desirability of a -stable backport?
> >
> > If so, are we able to identify a suitable Fixes: target?
>
> Good point, it should be:
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
>
> I am currently reworking the patch as Vlad was also concerned about
> (the existing) issue of accessing vm_struct fields without holding the
> lock [1].
>
> I will add this fixes tag to both patches for the v3 on the respin.
>

Sigh, I missed sharing the link to [1]:
[1] https://lore.kernel.org/all/20230901003321.GA3389909@google.com/

thanks,

 - Joel
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4..406634f26918 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1071,7 +1071,9 @@  void mem_dump_obj(void *object)
 	if (vmalloc_dump_obj(object))
 		return;
 
-	if (virt_addr_valid(object))
+	if (is_vmalloc_addr(object))
+		type = "vmalloc memory";
+	else if (virt_addr_valid(object))
 		type = "non-slab/vmalloc memory";
 	else if (object == NULL)
 		type = "NULL pointer";