diff mbox series

[v2] mm/kmemleak: Don't hold kmemleak_lock when calling printk()

Message ID 20240307184707.961255-1-longman@redhat.com (mailing list archive)
State New
Headers show
Series [v2] mm/kmemleak: Don't hold kmemleak_lock when calling printk() | expand

Commit Message

Waiman Long March 7, 2024, 6:47 p.m. UTC
When some error conditions happen (like OOM), some kmemleak functions
call printk() to dump out some useful debugging information while holding
the kmemleak_lock. This may cause deadlock as the printk() function
may need to allocate additional memory leading to a create_object()
call acquiring kmemleak_lock again.

An abbreviated lockdep splat is as follows:

[ 3991.452440] ======================================================
[ 3991.452441] WARNING: possible circular locking dependency detected
[ 3991.452442] 4.18.0-513.el8.x86_64+debug #1 Not tainted
[ 3991.452444] ------------------------------------------------------
[ 3991.452445] kworker/21:1H/436 is trying to acquire lock:
[ 3991.452446] ffffffff8b64c460 (console_owner) {....}-{0:0}, at: console_unlock+0x3dc/0xaa0
[ 3991.452452]
[ 3991.452453] but task is already holding lock:
[ 3991.452454] ffffffff8bd2b138 (kmemleak_lock){..}-{2:2}, at: create_object+0x4ba/0xac0
[ 3991.452459]
[ 3991.452460] which lock already depends on the new lock.
  :
[ 3991.452556] other info that might help us debug this:
[ 3991.452557]
[ 3991.452558] Chain exists of:
[ 3991.452559] console_owner -> &port>lock --> kmemleak_lock
[ 3991.452565]
[ 3991.452566] Possible unsafe locking scenario:
[ 3991.452566]
[ 3991.452567] CPU0 CPU1
[ 3991.452568] ---- ----
[ 3991.452569] lock(kmemleak_lock);
[ 3991.452572] lock(&port->lock);
[ 3991.452574] lock(kmemleak_lock);
[ 3991.452577] lock(console_owner);
[ 3991.452579]
[ 3991.452580] *** DEADLOCK ***
[ 3991.452582] 7 locks held by kworker/21:1H/436:
[ 3991.452582] #0: ff110003ec46b548 ((wq_completion)kblockd){..}-{0:0}, at: process_one_work+0x816/0x17e0
[ 3991.452588] #1: ffa0000004617e00 ((work_completion)(&(&hctx->run_work)>work)){..} {0:0}, at: process_one_work+0x84a/0x17e0
[ 3991.452594] #2: ffffffff8ba0b440 (rcu_read_lock) {....}-{1:2}, at: hctx_lock+0x6d/0x190
[ 3991.452599] #3: ff110001906ad818 (&host->lock){..}-{2:2}, at: ata_scsi_queuecmd+0x87/0x180 [libata]
[ 3991.452604] #4: ffffffff8bad8118 (radix_lock){..}-{2:2}, at: add_dma_entry+0x20f/0x4e0
[ 3991.452609] #5: ffffffff8bd2b138 (kmemleak_lock){..}-{2:2}, at: create_object+0x4ba/0xac0
[ 3991.452614] #6: ffffffff8b9ccae0 (console_lock){..}-{0:0}, at: vprintk_emit+0x1f5/0x420
[ 3991.452619]
[ 3991.452620] stack backtrace:
[ 3991.452621] CPU: 21 PID: 436 Comm: kworker/21:1H Kdump: loaded Not tainted 4.18.0-513.el8.x86_64+debug #1
[ 3991.452622] Hardware name: Lenovo ThinkSystem SR650 V2/7Z73CTO1WW, BIOS AFE118M-1.32 06/29/2022
[ 3991.452623] Workqueue: kblockd blk_mq_run_work_fn
[ 3991.452625] Call Trace:
[ 3991.452626] dump_stack+0x5c/0x80
[ 3991.452627] check_noncircular+0x283/0x320
[ 3991.452631] check_prevs_add+0x3fa/0x18b0
[ 3991.452635] __lock_acquire+0x21b6/0x2b70
[ 3991.452637] lock_acquire+0x1db/0x620
[ 3991.452640] console_unlock+0x44b/0xaa0
[ 3991.452644] vprintk_emit+0x1fe/0x420
[ 3991.452645] printk+0x9f/0xc9
[ 3991.452648] create_object.cold.19+0x13/0x86
[ 3991.452650] slab_post_alloc_hook+0x66/0x3b0
[ 3991.452652] kmem_cache_alloc+0x155/0x360
[ 3991.452653] radix_tree_node_alloc.constprop.7+0x172/0x2f0
[ 3991.452654] radix_tree_insert+0x197/0x580
[ 3991.452657] add_dma_entry+0x224/0x4e0
[ 3991.452659] debug_dma_map_sg+0x5d7/0xc10
[ 3991.452661] dma_map_sg_attrs+0xc7/0x190
[ 3991.452662] ata_qc_issue+0x65c/0xd60 [libata]
[ 3991.452665] __ata_scsi_queuecmd+0x45f/0xc40 [libata]
[ 3991.452666] ata_scsi_queuecmd+0xa5/0x180 [libata]
[ 3991.452667] scsi_queue_rq+0x16bc/0x3200
[ 3991.452668] blk_mq_dispatch_rq_list+0x3a3/0x2100
[ 3991.452675] blk_mq_do_dispatch_sched+0x72c/0xac0
[ 3991.452679] __blk_mq_sched_dispatch_requests+0x293/0x3f0
[ 3991.452682] blk_mq_sched_dispatch_requests+0xd0/0x130
[ 3991.452683] __blk_mq_run_hw_queue+0xa7/0x110
[ 3991.452686] process_one_work+0x93d/0x17e0
[ 3991.452688] worker_thread+0x87/0xb50
[ 3991.452691] kthread+0x334/0x3f0
[ 3991.452693] ret_from_fork+0x24/0x50
[ 3991.865403] kmemleak: Automatic memory scanning thread ended

Fix this deadlock issue by making sure that printk() is only called
after releasing the kmemleak_lock.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 62 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 18 deletions(-)

 [v2] Add lockdep splat & don't hold object->lock when calling
      dump_object_info()

Comments

Waiman Long March 7, 2024, 6:49 p.m. UTC | #1
On 3/7/24 13:47, Waiman Long wrote:
> When some error conditions happen (like OOM), some kmemleak functions
> call printk() to dump out some useful debugging information while holding
> the kmemleak_lock. This may cause deadlock as the printk() function
> may need to allocate additional memory leading to a create_object()
> call acquiring kmemleak_lock again.
>
> An abbreviated lockdep splat is as follows:

FYI, The full lockdep splat is

[ 3991.084294] Out of memory: Killed process 30721 (mandb) 
total-vm:38708kB, anon-rss:4kB, file-rss:1656kB, shmem-rss:0
kB, UID:0 pgtables:112kB oom_score_adj:0
[ 3991.433640] kmemleak: Memory pool empty, consider increasing 
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
[ 3991.445095] kmemleak: Cannot allocate a kmemleak_object structure
[ 3991.447309] kmemleak: Cannot allocate a kmemleak_object structure
[ 3991.448306] kmemleak: Cannot allocate a kmemleak_object structure
[ 3991.448396] kmemleak: Kernel memory leak detector disabled
[ 3991.452437]
[ 3991.452440] ======================================================
[ 3991.452441] WARNING: possible circular locking dependency detected
[ 3991.452442] 4.18.0-513.el8.x86_64+debug #1 Not tainted
[ 3991.452444] ------------------------------------------------------
[ 3991.452445] kworker/21:1H/436 is trying to acquire lock:
[ 3991.452446] ffffffff8b64c460 (console_owner) {....}-{0:0}, at: 
console_unlock+0x3dc/0xaa0
[ 3991.452452]
[ 3991.452453] but task is already holding lock:
[ 3991.452454] ffffffff8bd2b138 (kmemleak_lock){..}-{2:2}, at: 
create_object+0x4ba/0xac0
[ 3991.452459]
[ 3991.452460] which lock already depends on the new lock.
[ 3991.452461]
[ 3991.452461]
[ 3991.452462] the existing dependency chain (in reverse order) is:
[ 3991.452463]
[ 3991.452464] #3 (kmemleak_lock){.-.}-{2:2}:
[ 3991.452469] lock_acquire+0x1db/0x620
[ 3991.452470] _raw_spin_lock_irqsave+0x4c/0x90
[ 3991.452471] create_object+0x3fc/0xac0
[ 3991.452472] slab_post_alloc_hook+0x66/0x3b0
[ 3991.452473] __kmalloc+0x129/0x270
[ 3991.452474] __tty_buffer_request_room+0x1b4/0x550
[ 3991.452475] tty_insert_flip_string_fixed_flag.part.4+0x79/0x2e0
[ 3991.452476] tty_insert_flip_string_and_push_buffer+0x3e/0x150
[ 3991.452477] n_tty_write+0x48c/0xf10
[ 3991.452478] tty_write+0x389/0x970
[ 3991.452479] vfs_write+0x157/0x460
[ 3991.452480] redirected_tty_write+0x6b/0xb0
[ 3991.452481] do_iter_write+0x382/0x540
[ 3991.452482] vfs_writev+0x176/0x2e0
[ 3991.452483] do_writev+0xc1/0x220
[ 3991.452484] do_syscall_64+0xa5/0x450
[ 3991.452485] entry_SYSCALL_64_after_hwframe+0x66/0xdb
[ 3991.452486]
[ 3991.452486] #2 (&port>lock){..}-{2:2}:
[ 3991.452490] lock_acquire+0x1db/0x620
[ 3991.452491] _raw_spin_lock_irqsave+0x4c/0x90
[ 3991.452492] tty_port_tty_get+0x1d/0x80
[ 3991.452493] tty_port_default_wakeup+0xb/0x30
[ 3991.452494] serial8250_tx_chars+0x3df/0x980
[ 3991.452495] serial8250_handle_irq.part.14+0x145/0x220
[ 3991.452496] serial8250_default_handle_irq+0x82/0xe0
[ 3991.452497] serial8250_interrupt+0xde/0x1b0
[ 3991.452498] __handle_irq_event_percpu+0xfc/0x8a0
[ 3991.452499] handle_irq_event_percpu+0x70/0x140
[ 3991.452500] handle_irq_event+0xa5/0x135
[ 3991.452501] handle_edge_irq+0x20a/0xa40
[ 3991.452502] handle_irq+0x3e/0x60
[ 3991.452503] do_IRQ+0x82/0x200
[ 3991.452504] ret_from_intr+0x0/0x22
[ 3991.452505] cpuidle_enter_state+0x297/0x17f0
[ 3991.452506] cpuidle_enter+0x50/0xb0
[ 3991.452507] do_idle+0x4ed/0x630
[ 3991.452508] cpu_startup_entry+0xcb/0xd4
[ 3991.452509] start_secondary+0x3fb/0x540
[ 3991.452510] secondary_startup_64_no_verify+0xd2/0xdb
[ 3991.452511]
[ 3991.452512] > #1 (&port_lock_key){.-.}-{2:2}:
[ 3991.452516] lock_acquire+0x1db/0x620
[ 3991.452517] _raw_spin_lock_irqsave+0x4c/0x90
[ 3991.452518] serial8250_console_write+0x12c/0x8c0
[ 3991.452519] console_unlock+0x73d/0xaa0
[ 3991.452520] vprintk_emit+0x1fe/0x420
[ 3991.452520] printk+0x9f/0xc9
[ 3991.452521] lockdep_init+0xc/0xb6
[ 3991.452522] start_kernel+0x523/0x7d3
[ 3991.452523] secondary_startup_64_no_verify+0xd2/0xdb
[ 3991.452524]
[ 3991.452525] #0 (console_owner){....}-{0:0}:
[ 3991.452529] check_prevs_add+0x3fa/0x18b0
[ 3991.452530] __lock_acquire+0x21b6/0x2b70
[ 3991.452531] lock_acquire+0x1db/0x620
[ 3991.452532] console_unlock+0x44b/0xaa0
[ 3991.452533] vprintk_emit+0x1fe/0x420
[ 3991.452533] printk+0x9f/0xc9
[ 3991.452534] create_object.cold.19+0x13/0x86
[ 3991.452535] slab_post_alloc_hook+0x66/0x3b0
[ 3991.452536] kmem_cache_alloc+0x155/0x360
[ 3991.452537] radix_tree_node_alloc.constprop.7+0x172/0x2f0
[ 3991.452538] radix_tree_insert+0x197/0x580
[ 3991.452539] add_dma_entry+0x224/0x4e0
[ 3991.452540] debug_dma_map_sg+0x5d7/0xc10
[ 3991.452541] dma_map_sg_attrs+0xc7/0x190
[ 3991.452542] ata_qc_issue+0x65c/0xd60 [libata]
[ 3991.452544] __ata_scsi_queuecmd+0x45f/0xc40 [libata]
[ 3991.452545] ata_scsi_queuecmd+0xa5/0x180 [libata]
[ 3991.452546] scsi_queue_rq+0x16bc/0x3200
[ 3991.452547] blk_mq_dispatch_rq_list+0x3a3/0x2100
[ 3991.452548] blk_mq_do_dispatch_sched+0x72c/0xac0
[ 3991.452549] __blk_mq_sched_dispatch_requests+0x293/0x3f0
[ 3991.452550] blk_mq_sched_dispatch_requests+0xd0/0x130
[ 3991.452551] __blk_mq_run_hw_queue+0xa7/0x110
[ 3991.452552] process_one_work+0x93d/0x17e0
[ 3991.452553] worker_thread+0x87/0xb50
[ 3991.452554] kthread+0x334/0x3f0
[ 3991.452555] ret_from_fork+0x24/0x50
[ 3991.452555]
[ 3991.452556] other info that might help us debug this:
[ 3991.452557]
[ 3991.452558] Chain exists of:
[ 3991.452559] console_owner -> &port>lock --> kmemleak_lock
[ 3991.452565]
[ 3991.452566] Possible unsafe locking scenario:
[ 3991.452566]
[ 3991.452567] CPU0 CPU1
[ 3991.452568] ---- ----
[ 3991.452569] lock(kmemleak_lock);
[ 3991.452572] lock(&port->lock);
[ 3991.452574] lock(kmemleak_lock);
[ 3991.452577] lock(console_owner);
[ 3991.452579]
[ 3991.452580] *** DEADLOCK ***
[ 3991.452581]
[ 3991.452582] 7 locks held by kworker/21:1H/436:
[ 3991.452582] #0: ff110003ec46b548 ((wq_completion)kblockd){..}-{0:0}, 
at: process_one_work+0x816/0x17e0
[ 3991.452588] #1: ffa0000004617e00 
((work_completion)(&(&hctx->run_work)>work)){..} {0:0}, at: 
process_one_work+0x84a/
0x17e0
[ 3991.452594] #2: ffffffff8ba0b440 (rcu_read_lock) {....}-{1:2}, at: 
hctx_lock+0x6d/0x190
[ 3991.452599] #3: ff110001906ad818 (&host->lock){..}-{2:2}, at: 
ata_scsi_queuecmd+0x87/0x180 [libata]
[ 3991.452604] #4: ffffffff8bad8118 (radix_lock){..}-{2:2}, at: 
add_dma_entry+0x20f/0x4e0
[ 3991.452609] #5: ffffffff8bd2b138 (kmemleak_lock){..}-{2:2}, at: 
create_object+0x4ba/0xac0
[ 3991.452614] #6: ffffffff8b9ccae0 (console_lock){..}-{0:0}, at: 
vprintk_emit+0x1f5/0x420
[ 3991.452619]
[ 3991.452620] stack backtrace:
[ 3991.452621] CPU: 21 PID: 436 Comm: kworker/21:1H Kdump: loaded Not 
tainted 4.18.0-513.el8.x86_64+debug #1
[ 3991.452622] Hardware name: Lenovo ThinkSystem SR650 V2/7Z73CTO1WW, 
BIOS AFE118M-1.32 06/29/2022
[ 3991.452623] Workqueue: kblockd blk_mq_run_work_fn
[ 3991.452625] Call Trace:
[ 3991.452626] dump_stack+0x5c/0x80
[ 3991.452627] check_noncircular+0x283/0x320
[ 3991.452631] check_prevs_add+0x3fa/0x18b0
[ 3991.452635] __lock_acquire+0x21b6/0x2b70
[ 3991.452637] lock_acquire+0x1db/0x620
[ 3991.452640] console_unlock+0x44b/0xaa0
[ 3991.452644] vprintk_emit+0x1fe/0x420
[ 3991.452645] printk+0x9f/0xc9
[ 3991.452648] create_object.cold.19+0x13/0x86
[ 3991.452650] slab_post_alloc_hook+0x66/0x3b0
[ 3991.452652] kmem_cache_alloc+0x155/0x360
[ 3991.452653] radix_tree_node_alloc.constprop.7+0x172/0x2f0
[ 3991.452654] radix_tree_insert+0x197/0x580
[ 3991.452657] add_dma_entry+0x224/0x4e0
[ 3991.452659] debug_dma_map_sg+0x5d7/0xc10
[ 3991.452661] dma_map_sg_attrs+0xc7/0x190
[ 3991.452662] ata_qc_issue+0x65c/0xd60 [libata]
[ 3991.452665] __ata_scsi_queuecmd+0x45f/0xc40 [libata]
[ 3991.452666] ata_scsi_queuecmd+0xa5/0x180 [libata]
[ 3991.452667] scsi_queue_rq+0x16bc/0x3200
[ 3991.452668] blk_mq_dispatch_rq_list+0x3a3/0x2100
[ 3991.452675] blk_mq_do_dispatch_sched+0x72c/0xac0
[ 3991.452679] __blk_mq_sched_dispatch_requests+0x293/0x3f0
[ 3991.452682] blk_mq_sched_dispatch_requests+0xd0/0x130
[ 3991.452683] __blk_mq_run_hw_queue+0xa7/0x110
[ 3991.452686] process_one_work+0x93d/0x17e0
[ 3991.452688] worker_thread+0x87/0xb50
[ 3991.452691] kthread+0x334/0x3f0
[ 3991.452693] ret_from_fork+0x24/0x50
[ 3991.865403] kmemleak: Automatic memory scanning thread ended
[ 3992.174346] -----------[ cut here ]-----------

Cheers,
Longman
Andrew Morton March 7, 2024, 7:46 p.m. UTC | #2
On Thu,  7 Mar 2024 13:47:07 -0500 Waiman Long <longman@redhat.com> wrote:

> When some error conditions happen (like OOM), some kmemleak functions
> call printk() to dump out some useful debugging information while holding
> the kmemleak_lock. This may cause deadlock as the printk() function
> may need to allocate additional memory leading to a create_object()
> call acquiring kmemleak_lock again.
> 
> An abbreviated lockdep splat is as follows:
>
> ...
> 
> Fix this deadlock issue by making sure that printk() is only called
> after releasing the kmemleak_lock.
> 
> ...
>
> @@ -427,9 +442,19 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
>  		else if (untagged_objp == untagged_ptr || alias)
>  			return object;
>  		else {
> +			if (!get_object(object))
> +				break;
> +			/*
> +			 * Release kmemleak_lock temporarily to avoid deadlock
> +			 * in printk(). dump_object_info() is called without
> +			 * holding object->lock (race unlikely).
> +			 */
> +			raw_spin_unlock(&kmemleak_lock);
>  			kmemleak_warn("Found object by alias at 0x%08lx\n",
>  				      ptr);
>  			dump_object_info(object);
> +			put_object(object);
> +			raw_spin_lock(&kmemleak_lock);
>  			break;

Please include a full description of why this is safe.  Once we've
dropped that lock, the tree is in an unknown state and we shouldn't
touch it again.  This consideration should be added to the relevant
functions' interface documentation and the code should be reviewed to
ensure that we're actually adhering to this.  Or something like that.

To simply drop and reacquire a lock without supporting analysis and
comments does not inspire confidence!
Catalin Marinas March 27, 2024, 5:43 p.m. UTC | #3
On Thu, Mar 07, 2024 at 11:46:30AM -0800, Andrew Morton wrote:
> On Thu,  7 Mar 2024 13:47:07 -0500 Waiman Long <longman@redhat.com> wrote:
> > When some error conditions happen (like OOM), some kmemleak functions
> > call printk() to dump out some useful debugging information while holding
> > the kmemleak_lock. This may cause deadlock as the printk() function
> > may need to allocate additional memory leading to a create_object()
> > call acquiring kmemleak_lock again.
> > 
> > An abbreviated lockdep splat is as follows:
> >
> > ...
> > 
> > Fix this deadlock issue by making sure that printk() is only called
> > after releasing the kmemleak_lock.
> > 
> > ...
> >
> > @@ -427,9 +442,19 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
> >  		else if (untagged_objp == untagged_ptr || alias)
> >  			return object;
> >  		else {
> > +			if (!get_object(object))
> > +				break;
> > +			/*
> > +			 * Release kmemleak_lock temporarily to avoid deadlock
> > +			 * in printk(). dump_object_info() is called without
> > +			 * holding object->lock (race unlikely).
> > +			 */
> > +			raw_spin_unlock(&kmemleak_lock);
> >  			kmemleak_warn("Found object by alias at 0x%08lx\n",
> >  				      ptr);
> >  			dump_object_info(object);
> > +			put_object(object);
> > +			raw_spin_lock(&kmemleak_lock);
> >  			break;
> 
> Please include a full description of why this is safe.  Once we've
> dropped that lock, the tree is in an unknown state and we shouldn't
> touch it again.  This consideration should be added to the relevant
> functions' interface documentation and the code should be reviewed to
> ensure that we're actually adhering to this.  Or something like that.
> 
> To simply drop and reacquire a lock without supporting analysis and
> comments does not inspire confidence!

I agree it looks fragile. I think it works, the code tends to bail out
on those errors and doesn't expect the protected data to have remained
intact. But we may change it in the future and forgot about this.

I wonder whether we can actually make things slightly easier to reason
about, defer the printing until unlock, store the details in some
per-cpu variable. Another option would be to have a per-CPU array to
store potential recursive kmemleak_*() callbacks during the critical
regions. This should be bounded since the interrupts are disabled. On
unlock, we'd replay the array and add those pointers.
Waiman Long March 27, 2024, 6:58 p.m. UTC | #4
On 3/27/24 13:43, Catalin Marinas wrote:
> On Thu, Mar 07, 2024 at 11:46:30AM -0800, Andrew Morton wrote:
>> On Thu,  7 Mar 2024 13:47:07 -0500 Waiman Long <longman@redhat.com> wrote:
>>> When some error conditions happen (like OOM), some kmemleak functions
>>> call printk() to dump out some useful debugging information while holding
>>> the kmemleak_lock. This may cause deadlock as the printk() function
>>> may need to allocate additional memory leading to a create_object()
>>> call acquiring kmemleak_lock again.
>>>
>>> An abbreviated lockdep splat is as follows:
>>>
>>> ...
>>>
>>> Fix this deadlock issue by making sure that printk() is only called
>>> after releasing the kmemleak_lock.
>>>
>>> ...
>>>
>>> @@ -427,9 +442,19 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
>>>   		else if (untagged_objp == untagged_ptr || alias)
>>>   			return object;
>>>   		else {
>>> +			if (!get_object(object))
>>> +				break;
>>> +			/*
>>> +			 * Release kmemleak_lock temporarily to avoid deadlock
>>> +			 * in printk(). dump_object_info() is called without
>>> +			 * holding object->lock (race unlikely).
>>> +			 */
>>> +			raw_spin_unlock(&kmemleak_lock);
>>>   			kmemleak_warn("Found object by alias at 0x%08lx\n",
>>>   				      ptr);
>>>   			dump_object_info(object);
>>> +			put_object(object);
>>> +			raw_spin_lock(&kmemleak_lock);
>>>   			break;
>> Please include a full description of why this is safe.  Once we've
>> dropped that lock, the tree is in an unknown state and we shouldn't
>> touch it again.  This consideration should be added to the relevant
>> functions' interface documentation and the code should be reviewed to
>> ensure that we're actually adhering to this.  Or something like that.
>>
>> To simply drop and reacquire a lock without supporting analysis and
>> comments does not inspire confidence!
> I agree it looks fragile. I think it works, the code tends to bail out
> on those errors and doesn't expect the protected data to have remained
> intact. But we may change it in the future and forgot about this.
>
> I wonder whether we can actually make things slightly easier to reason
> about, defer the printing until unlock, store the details in some
> per-cpu variable. Another option would be to have a per-CPU array to
> store potential recursive kmemleak_*() callbacks during the critical
> regions. This should be bounded since the interrupts are disabled. On
> unlock, we'd replay the array and add those pointers.

It looks like most of the callers of __lookup_object() will bail out 
when an error happen. So there should be no harm in temporarily 
releasing the lock. However, I do agree that it is fragile and future 
changes may break it. This patch certainly need more work.

Cheers,
Longman

>
diff mbox series

Patch

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 6a540c2b27c5..4f58f6170cdf 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -401,6 +401,19 @@  static struct rb_root *object_tree(unsigned long objflags)
 	return &object_tree_root;
 }
 
+/*
+ * Increment the object use_count. Return 1 if successful or 0 otherwise. Note
+ * that once an object's use_count reached 0, the RCU freeing was already
+ * registered and the object should no longer be used. This function must be
+ * called under the protection of rcu_read_lock().
+ */
+static int get_object(struct kmemleak_object *object)
+{
+	return atomic_inc_not_zero(&object->use_count);
+}
+
+static void put_object(struct kmemleak_object *object);
+
 /*
  * Look-up a memory block metadata (kmemleak_object) in the object search
  * tree based on a pointer value. If alias is 0, only values pointing to the
@@ -413,6 +426,8 @@  static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
 	struct rb_node *rb = object_tree(objflags)->rb_node;
 	unsigned long untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
 
+	lockdep_assert_held(&kmemleak_lock);
+
 	while (rb) {
 		struct kmemleak_object *object;
 		unsigned long untagged_objp;
@@ -427,9 +442,19 @@  static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
 		else if (untagged_objp == untagged_ptr || alias)
 			return object;
 		else {
+			if (!get_object(object))
+				break;
+			/*
+			 * Release kmemleak_lock temporarily to avoid deadlock
+			 * in printk(). dump_object_info() is called without
+			 * holding object->lock (race unlikely).
+			 */
+			raw_spin_unlock(&kmemleak_lock);
 			kmemleak_warn("Found object by alias at 0x%08lx\n",
 				      ptr);
 			dump_object_info(object);
+			put_object(object);
+			raw_spin_lock(&kmemleak_lock);
 			break;
 		}
 	}
@@ -442,22 +467,12 @@  static struct kmemleak_object *lookup_object(unsigned long ptr, int alias)
 	return __lookup_object(ptr, alias, 0);
 }
 
-/*
- * Increment the object use_count. Return 1 if successful or 0 otherwise. Note
- * that once an object's use_count reached 0, the RCU freeing was already
- * registered and the object should no longer be used. This function must be
- * called under the protection of rcu_read_lock().
- */
-static int get_object(struct kmemleak_object *object)
-{
-	return atomic_inc_not_zero(&object->use_count);
-}
-
 /*
  * Memory pool allocation and freeing. kmemleak_lock must not be held.
  */
 static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 {
+	bool warn = false;
 	unsigned long flags;
 	struct kmemleak_object *object;
 
@@ -477,9 +492,11 @@  static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 	else if (mem_pool_free_count)
 		object = &mem_pool[--mem_pool_free_count];
 	else
-		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+		warn = true;
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
+	if (unlikely(warn))
+		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
 	return object;
 }
 
@@ -692,6 +709,8 @@  static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 	unsigned long untagged_ptr;
 	unsigned long untagged_objp;
 
+	lockdep_assert_held(&kmemleak_lock);
+
 	object->flags = OBJECT_ALLOCATED | objflags;
 	object->pointer = ptr;
 	object->size = kfence_ksize((void *)ptr) ?: size;
@@ -718,13 +737,20 @@  static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 		else if (untagged_objp + parent->size <= untagged_ptr)
 			link = &parent->rb_node.rb_right;
 		else {
-			kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n",
-				      ptr);
+			if (!get_object(parent))
+				return -EEXIST;
 			/*
-			 * No need for parent->lock here since "parent" cannot
-			 * be freed while the kmemleak_lock is held.
+			 * Release kmemleak_lock temporarily to avoid deadlock
+			 * in printk(). dump_object_info() is called without
+			 * holding parent->lock (race unlikely).
 			 */
+			raw_spin_unlock(&kmemleak_lock);
+
+			kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n",
+				      ptr);
 			dump_object_info(parent);
+			put_object(parent);
+			raw_spin_lock(&kmemleak_lock);
 			return -EEXIST;
 		}
 	}
@@ -839,11 +865,12 @@  static void delete_object_part(unsigned long ptr, size_t size,
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	object = __find_and_remove_object(ptr, 1, objflags);
 	if (!object) {
+		raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 #ifdef DEBUG
 		kmemleak_warn("Partially freeing unknown object at 0x%08lx (size %zu)\n",
 			      ptr, size);
 #endif
-		goto unlock;
+		goto out;
 	}
 
 	/*
@@ -862,7 +889,6 @@  static void delete_object_part(unsigned long ptr, size_t size,
 			   object->min_count, objflags))
 		object_r = NULL;
 
-unlock:
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 	if (object)
 		__delete_object(object);