diff mbox series

[v3,2/2] rcu: Dump vmalloc memory info safely

Message ID 20230904180806.1002832-2-joel@joelfernandes.org (mailing list archive)
State Accepted
Commit c83ad36a18c02c0f51280b50272327807916987f
Headers show
Series [v3,1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug | expand

Commit Message

Joel Fernandes Sept. 4, 2023, 6:08 p.m. UTC
From: Zqiang <qiang.zhang1211@gmail.com>

Currently, for double invoke call_rcu(), will dump rcu_head objects
memory info, if the objects is not allocated from the slab allocator,
the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
need to be held, since the call_rcu() can be invoked in interrupt context,
therefore, there is a possibility of spinlock deadlock scenarios.

And in Preempt-RT kernel, the rcutorture test also trigger the following
lockdep warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
3 locks held by swapper/0/1:
 #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
irq event stamp: 565512
hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
Preemption disabled at:
[<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xb0
 dump_stack+0x14/0x20
 __might_resched+0x1aa/0x280
 ? __pfx_rcu_torture_err_cb+0x10/0x10
 rt_spin_lock+0x53/0x130
 ? find_vmap_area+0x1f/0x70
 find_vmap_area+0x1f/0x70
 vmalloc_dump_obj+0x20/0x60
 mem_dump_obj+0x22/0x90
 __call_rcu_common+0x5bf/0x940
 ? debug_smp_processor_id+0x1b/0x30
 call_rcu_hurry+0x14/0x20
 rcu_torture_init+0x1f82/0x2370
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_init+0x10/0x10
 do_one_initcall+0x6c/0x300
 ? debug_smp_processor_id+0x1b/0x30
 kernel_init_freeable+0x2b9/0x540
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x1f/0x150
 ret_from_fork+0x40/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

The previous patch fixes this by using the deadlock-safe best-effort
version of find_vm_area. However, in case of failure print the fact that
the pointer was a vmalloc pointer so that we print at least something.

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Lorenzo Stoakes Sept. 5, 2023, 7 a.m. UTC | #1
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> From: Zqiang <qiang.zhang1211@gmail.com>
>
> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
>
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:
>
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> RCU nest depth: 1, expected: 1
> 3 locks held by swapper/0/1:
>  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> irq event stamp: 565512
> hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> Preemption disabled at:
> [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x68/0xb0
>  dump_stack+0x14/0x20
>  __might_resched+0x1aa/0x280
>  ? __pfx_rcu_torture_err_cb+0x10/0x10
>  rt_spin_lock+0x53/0x130
>  ? find_vmap_area+0x1f/0x70
>  find_vmap_area+0x1f/0x70
>  vmalloc_dump_obj+0x20/0x60
>  mem_dump_obj+0x22/0x90
>  __call_rcu_common+0x5bf/0x940
>  ? debug_smp_processor_id+0x1b/0x30
>  call_rcu_hurry+0x14/0x20
>  rcu_torture_init+0x1f82/0x2370
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_init+0x10/0x10
>  do_one_initcall+0x6c/0x300
>  ? debug_smp_processor_id+0x1b/0x30
>  kernel_init_freeable+0x2b9/0x540
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x1f/0x150
>  ret_from_fork+0x40/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
>
> The previous patch fixes this by using the deadlock-safe best-effort
> version of find_vm_area. However, in case of failure print the fact that
> the pointer was a vmalloc pointer so that we print at least something.
>
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/util.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index dd12b9531ac4..406634f26918 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
>  	if (vmalloc_dump_obj(object))
>  		return;
>
> -	if (virt_addr_valid(object))
> +	if (is_vmalloc_addr(object))
> +		type = "vmalloc memory";
> +	else if (virt_addr_valid(object))
>  		type = "non-slab/vmalloc memory";

I think you should update this to say non-slab/non-vmalloc memory (as much
as that description sucks!) as this phrasing in the past meant to say
'non-slab or vmalloc memory' (already confusing phrasing) so better to be
clear.

>  	else if (object == NULL)
>  		type = "NULL pointer";
> --
> 2.42.0.283.g2d96d420d3-goog
>
Joel Fernandes Sept. 5, 2023, 11:48 a.m. UTC | #2
On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> > From: Zqiang <qiang.zhang1211@gmail.com>
> >
> > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > memory info, if the objects is not allocated from the slab allocator,
> > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > need to be held, since the call_rcu() can be invoked in interrupt context,
> > therefore, there is a possibility of spinlock deadlock scenarios.
> >
> > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > lockdep warning:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 1, expected: 0
> > RCU nest depth: 1, expected: 1
> > 3 locks held by swapper/0/1:
> >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> > irq event stamp: 565512
> > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> > Preemption disabled at:
> > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x68/0xb0
> >  dump_stack+0x14/0x20
> >  __might_resched+0x1aa/0x280
> >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> >  rt_spin_lock+0x53/0x130
> >  ? find_vmap_area+0x1f/0x70
> >  find_vmap_area+0x1f/0x70
> >  vmalloc_dump_obj+0x20/0x60
> >  mem_dump_obj+0x22/0x90
> >  __call_rcu_common+0x5bf/0x940
> >  ? debug_smp_processor_id+0x1b/0x30
> >  call_rcu_hurry+0x14/0x20
> >  rcu_torture_init+0x1f82/0x2370
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_init+0x10/0x10
> >  do_one_initcall+0x6c/0x300
> >  ? debug_smp_processor_id+0x1b/0x30
> >  kernel_init_freeable+0x2b9/0x540
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x1f/0x150
> >  ret_from_fork+0x40/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> >
> > The previous patch fixes this by using the deadlock-safe best-effort
> > version of find_vm_area. However, in case of failure print the fact that
> > the pointer was a vmalloc pointer so that we print at least something.
> >
> > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: rcu@vger.kernel.org
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  mm/util.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/util.c b/mm/util.c
> > index dd12b9531ac4..406634f26918 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> >  	if (vmalloc_dump_obj(object))
> >  		return;
> >
> > -	if (virt_addr_valid(object))
> > +	if (is_vmalloc_addr(object))
> > +		type = "vmalloc memory";
> > +	else if (virt_addr_valid(object))
> >  		type = "non-slab/vmalloc memory";
> 
> I think you should update this to say non-slab/non-vmalloc memory (as much
> as that description sucks!) as this phrasing in the past meant to say
> 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> clear.

True, though the issue you mentioned it is in existing code, a respin of this
patch could update it to say non-vmalloc. Good point, thanks for reviewing!

 - Joel
Lorenzo Stoakes Sept. 6, 2023, 7:18 p.m. UTC | #3
On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> > > From: Zqiang <qiang.zhang1211@gmail.com>
> > >
> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > > memory info, if the objects is not allocated from the slab allocator,
> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> > > therefore, there is a possibility of spinlock deadlock scenarios.
> > >
> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > > lockdep warning:
> > >
> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > > preempt_count: 1, expected: 0
> > > RCU nest depth: 1, expected: 1
> > > 3 locks held by swapper/0/1:
> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> > > irq event stamp: 565512
> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> > > Preemption disabled at:
> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x68/0xb0
> > >  dump_stack+0x14/0x20
> > >  __might_resched+0x1aa/0x280
> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> > >  rt_spin_lock+0x53/0x130
> > >  ? find_vmap_area+0x1f/0x70
> > >  find_vmap_area+0x1f/0x70
> > >  vmalloc_dump_obj+0x20/0x60
> > >  mem_dump_obj+0x22/0x90
> > >  __call_rcu_common+0x5bf/0x940
> > >  ? debug_smp_processor_id+0x1b/0x30
> > >  call_rcu_hurry+0x14/0x20
> > >  rcu_torture_init+0x1f82/0x2370
> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >  ? __pfx_rcu_torture_init+0x10/0x10
> > >  do_one_initcall+0x6c/0x300
> > >  ? debug_smp_processor_id+0x1b/0x30
> > >  kernel_init_freeable+0x2b9/0x540
> > >  ? __pfx_kernel_init+0x10/0x10
> > >  kernel_init+0x1f/0x150
> > >  ret_from_fork+0x40/0x50
> > >  ? __pfx_kernel_init+0x10/0x10
> > >  ret_from_fork_asm+0x1b/0x30
> > >  </TASK>
> > >
> > > The previous patch fixes this by using the deadlock-safe best-effort
> > > version of find_vm_area. However, in case of failure print the fact that
> > > the pointer was a vmalloc pointer so that we print at least something.
> > >
> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: rcu@vger.kernel.org
> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  mm/util.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/util.c b/mm/util.c
> > > index dd12b9531ac4..406634f26918 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> > >     if (vmalloc_dump_obj(object))
> > >             return;
> > >
> > > -   if (virt_addr_valid(object))
> > > +   if (is_vmalloc_addr(object))
> > > +           type = "vmalloc memory";
> > > +   else if (virt_addr_valid(object))
> > >             type = "non-slab/vmalloc memory";
> >
> > I think you should update this to say non-slab/non-vmalloc memory (as much
> > as that description sucks!) as this phrasing in the past meant to say
> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> > clear.
>
> True, though the issue you mentioned it is in existing code, a respin of this
> patch could update it to say non-vmalloc. Good point, thanks for reviewing!

No it's not, you're changing the meaning, because you changed the code
that determines the output...

This has been merged now despite my outstanding comments (!) so I
guess I'll have to send a follow up patch to address this.

>
>  - Joel
>



--
Lorenzo Stoakes
https://ljs.io
Vlastimil Babka Sept. 7, 2023, 7:10 a.m. UTC | #4
On 9/6/23 21:18, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
>>
>> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
>> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
>> > > From: Zqiang <qiang.zhang1211@gmail.com>
>> > >
>> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
>> > > memory info, if the objects is not allocated from the slab allocator,
>> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
>> > > need to be held, since the call_rcu() can be invoked in interrupt context,
>> > > therefore, there is a possibility of spinlock deadlock scenarios.
>> > >
>> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
>> > > lockdep warning:
>> > >
>> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
>> > > preempt_count: 1, expected: 0
>> > > RCU nest depth: 1, expected: 1
>> > > 3 locks held by swapper/0/1:
>> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
>> > > irq event stamp: 565512
>> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
>> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
>> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
>> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
>> > > Preemption disabled at:
>> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
>> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
>> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>> > > Call Trace:
>> > >  <TASK>
>> > >  dump_stack_lvl+0x68/0xb0
>> > >  dump_stack+0x14/0x20
>> > >  __might_resched+0x1aa/0x280
>> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
>> > >  rt_spin_lock+0x53/0x130
>> > >  ? find_vmap_area+0x1f/0x70
>> > >  find_vmap_area+0x1f/0x70
>> > >  vmalloc_dump_obj+0x20/0x60
>> > >  mem_dump_obj+0x22/0x90
>> > >  __call_rcu_common+0x5bf/0x940
>> > >  ? debug_smp_processor_id+0x1b/0x30
>> > >  call_rcu_hurry+0x14/0x20
>> > >  rcu_torture_init+0x1f82/0x2370
>> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > >  ? __pfx_rcu_torture_init+0x10/0x10
>> > >  do_one_initcall+0x6c/0x300
>> > >  ? debug_smp_processor_id+0x1b/0x30
>> > >  kernel_init_freeable+0x2b9/0x540
>> > >  ? __pfx_kernel_init+0x10/0x10
>> > >  kernel_init+0x1f/0x150
>> > >  ret_from_fork+0x40/0x50
>> > >  ? __pfx_kernel_init+0x10/0x10
>> > >  ret_from_fork_asm+0x1b/0x30
>> > >  </TASK>
>> > >
>> > > The previous patch fixes this by using the deadlock-safe best-effort
>> > > version of find_vm_area. However, in case of failure print the fact that
>> > > the pointer was a vmalloc pointer so that we print at least something.
>> > >
>> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
>> > > Cc: Paul E. McKenney <paulmck@kernel.org>
>> > > Cc: rcu@vger.kernel.org
>> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
>> > > Cc: stable@vger.kernel.org
>> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> > > ---
>> > >  mm/util.c | 4 +++-
>> > >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/mm/util.c b/mm/util.c
>> > > index dd12b9531ac4..406634f26918 100644
>> > > --- a/mm/util.c
>> > > +++ b/mm/util.c
>> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
>> > >     if (vmalloc_dump_obj(object))
>> > >             return;
>> > >
>> > > -   if (virt_addr_valid(object))
>> > > +   if (is_vmalloc_addr(object))
>> > > +           type = "vmalloc memory";
>> > > +   else if (virt_addr_valid(object))
>> > >             type = "non-slab/vmalloc memory";
>> >
>> > I think you should update this to say non-slab/non-vmalloc memory (as much
>> > as that description sucks!) as this phrasing in the past meant to say
>> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
>> > clear.
>>
>> True, though the issue you mentioned it is in existing code, a respin of this
>> patch could update it to say non-vmalloc. Good point, thanks for reviewing!
> 
> No it's not, you're changing the meaning, because you changed the code
> that determines the output...

I think it has always meant (but clearly it's not unambiguously worded) "not
slab && not vmalloc", that is before and after this patch. Only in case
patch 1 is applied and patch 2 not, can the output be wrong in that a
vmalloc pointer will (in case of trylock fail) be classified as "not slab &&
not vmalloc", but seems fine to me after patch 2.

I guess if we wanted, we could also rewrite it to be more like the kmem
check in the beginning of mem_dump_obj(), so there would be:

if (is_vmalloc_addr(...)) {
    vmalloc_dump_obj(...);
    return;
}

where vmalloc_dump_obj() itself would print at least "vmalloc memory" with
no further details in case of trylock failure.

that assumes is_vmalloc_addr() is guaranteed to be true for all addresses
that __find_vmap_area resolves, otherwise it could miss something compared
to current code. Is it guaranteed?

> This has been merged now despite my outstanding comments (!) so I
> guess I'll have to send a follow up patch to address this.
> 
>>
>>  - Joel
>>
> 
> 
> 
> --
> Lorenzo Stoakes
> https://ljs.io
Joel Fernandes Sept. 8, 2023, 12:26 a.m. UTC | #5
On Thu, Sep 07, 2023 at 09:10:38AM +0200, Vlastimil Babka wrote:
> On 9/6/23 21:18, Lorenzo Stoakes wrote:
> > On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
> >>
> >> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> >> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> >> > > From: Zqiang <qiang.zhang1211@gmail.com>
> >> > >
> >> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> >> > > memory info, if the objects is not allocated from the slab allocator,
> >> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> >> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> >> > > therefore, there is a possibility of spinlock deadlock scenarios.
> >> > >
> >> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
> >> > > lockdep warning:
> >> > >
> >> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> >> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> >> > > preempt_count: 1, expected: 0
> >> > > RCU nest depth: 1, expected: 1
> >> > > 3 locks held by swapper/0/1:
> >> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> >> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> >> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> >> > > irq event stamp: 565512
> >> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> >> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> >> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> >> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> >> > > Preemption disabled at:
> >> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> >> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> >> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> >> > > Call Trace:
> >> > >  <TASK>
> >> > >  dump_stack_lvl+0x68/0xb0
> >> > >  dump_stack+0x14/0x20
> >> > >  __might_resched+0x1aa/0x280
> >> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> >> > >  rt_spin_lock+0x53/0x130
> >> > >  ? find_vmap_area+0x1f/0x70
> >> > >  find_vmap_area+0x1f/0x70
> >> > >  vmalloc_dump_obj+0x20/0x60
> >> > >  mem_dump_obj+0x22/0x90
> >> > >  __call_rcu_common+0x5bf/0x940
> >> > >  ? debug_smp_processor_id+0x1b/0x30
> >> > >  call_rcu_hurry+0x14/0x20
> >> > >  rcu_torture_init+0x1f82/0x2370
> >> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >> > >  ? __pfx_rcu_torture_init+0x10/0x10
> >> > >  do_one_initcall+0x6c/0x300
> >> > >  ? debug_smp_processor_id+0x1b/0x30
> >> > >  kernel_init_freeable+0x2b9/0x540
> >> > >  ? __pfx_kernel_init+0x10/0x10
> >> > >  kernel_init+0x1f/0x150
> >> > >  ret_from_fork+0x40/0x50
> >> > >  ? __pfx_kernel_init+0x10/0x10
> >> > >  ret_from_fork_asm+0x1b/0x30
> >> > >  </TASK>
> >> > >
> >> > > The previous patch fixes this by using the deadlock-safe best-effort
> >> > > version of find_vm_area. However, in case of failure print the fact that
> >> > > the pointer was a vmalloc pointer so that we print at least something.
> >> > >
> >> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> >> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> >> > > Cc: rcu@vger.kernel.org
> >> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> >> > > Cc: stable@vger.kernel.org
> >> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> >> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> > > ---
> >> > >  mm/util.c | 4 +++-
> >> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> >> > >
> >> > > diff --git a/mm/util.c b/mm/util.c
> >> > > index dd12b9531ac4..406634f26918 100644
> >> > > --- a/mm/util.c
> >> > > +++ b/mm/util.c
> >> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> >> > >     if (vmalloc_dump_obj(object))
> >> > >             return;
> >> > >
> >> > > -   if (virt_addr_valid(object))
> >> > > +   if (is_vmalloc_addr(object))
> >> > > +           type = "vmalloc memory";
> >> > > +   else if (virt_addr_valid(object))
> >> > >             type = "non-slab/vmalloc memory";
> >> >
> >> > I think you should update this to say non-slab/non-vmalloc memory (as much
> >> > as that description sucks!) as this phrasing in the past meant to say
> >> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> >> > clear.
> >>
> >> True, though the issue you mentioned it is in existing code, a respin of this
> >> patch could update it to say non-vmalloc. Good point, thanks for reviewing!
> > 
> > No it's not, you're changing the meaning, because you changed the code
> > that determines the output...
> 
> I think it has always meant (but clearly it's not unambiguously worded) "not
> slab && not vmalloc", that is before and after this patch. Only in case
> patch 1 is applied and patch 2 not, can the output be wrong in that a
> vmalloc pointer will (in case of trylock fail) be classified as "not slab &&
> not vmalloc", but seems fine to me after patch 2.
> 
> I guess if we wanted, we could also rewrite it to be more like the kmem
> check in the beginning of mem_dump_obj(), so there would be:
> 
> if (is_vmalloc_addr(...)) {
>     vmalloc_dump_obj(...);
>     return;
> }
> 
> where vmalloc_dump_obj() itself would print at least "vmalloc memory" with
> no further details in case of trylock failure.
> 
> that assumes is_vmalloc_addr() is guaranteed to be true for all addresses
> that __find_vmap_area resolves, otherwise it could miss something compared
> to current code. Is it guaranteed?

It is guaranteed based on my reading of the code. But maybe it may aid
additional vmalloc-internals debugging if for some reason the address of the
object stored in the vmalloc data structures is out of bound for some reason
and the lookup actually succeded. That's just a hypothetical situation though
and I don't think that that can actually happen.

thanks,

 - Joel
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4..406634f26918 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1071,7 +1071,9 @@  void mem_dump_obj(void *object)
 	if (vmalloc_dump_obj(object))
 		return;
 
-	if (virt_addr_valid(object))
+	if (is_vmalloc_addr(object))
+		type = "vmalloc memory";
+	else if (virt_addr_valid(object))
 		type = "non-slab/vmalloc memory";
 	else if (object == NULL)
 		type = "NULL pointer";