Message ID | 1542877459-144382-1-git-send-email-zhe.he@windriver.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT | expand |
On 2018-11-22 17:04:19 [+0800], zhe.he@windriver.com wrote: > From: He Zhe <zhe.he@windriver.com> > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. please use [PATCH RT … ] in future while posting for RT. And this was (and is) on my TODO list. Sebastian
On 2018-11-22 17:04:19 [+0800], zhe.he@windriver.com wrote: > From: He Zhe <zhe.he@windriver.com> > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. > > BUG: scheduling while atomic: migration/15/132/0x00000002 … > Preemption disabled at: > [<ffffffff8c927c11>] cpu_stopper_thread+0x71/0x100 > CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > Call Trace: > dump_stack+0x4f/0x6a > ? cpu_stopper_thread+0x71/0x100 > __schedule_bug.cold.16+0x38/0x55 > __schedule+0x484/0x6c0 > schedule+0x3d/0xe0 > rt_spin_lock_slowlock_locked+0x118/0x2a0 > rt_spin_lock_slowlock+0x57/0x90 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > ? intel_pmu_cpu_dying+0x67/0x70 > rt_write_lock+0x2a/0x30 > find_and_remove_object+0x1e/0x80 > delete_object_full+0x10/0x20 > kmemleak_free+0x32/0x50 > kfree+0x104/0x1f0 > ? x86_pmu_starting_cpu+0x30/0x30 > intel_pmu_cpu_dying+0x67/0x70 > x86_pmu_dying_cpu+0x1a/0x30 > cpuhp_invoke_callback+0x92/0x700 > take_cpu_down+0x70/0xa0 > multi_cpu_stop+0x62/0xc0 > ? cpu_stop_queue_work+0x130/0x130 > cpu_stopper_thread+0x79/0x100 > smpboot_thread_fn+0x20f/0x2d0 > kthread+0x121/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 If this is the only problem? kfree() from a preempt-disabled section should cause a warning even without kmemleak. > And on v4.18 stable tree the following call trace, caused by grabbing > kmemleak_lock again, is also observed. > > kernel BUG at kernel/locking/rtmutex.c:1048! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 … > Call Trace: > ? preempt_count_add+0x74/0xc0 > rt_spin_lock_slowlock+0x57/0x90 > ? __kernel_text_address+0x12/0x40 > ? __save_stack_trace+0x75/0x100 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 … is this an RT-only problem? Because mainline should not allow read->read locking or read->write locking for reader-writer locks. If this only happens on v4.18 and not on v4.19 then something must have fixed it. Sebastian
> is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. Probably misunderstanding, but I'd say that read->read locking is "the norm"...? If you don't use qrwlock, readers are also "recursive", in part., P0 P1 read_lock(l) write_lock(l) read_lock(l) won't block P0 on the second read_lock(). (qrwlock somehow complicate the analysis; IIUC, they are recursive if and only if in_interrupt().). Andrea > > > Sebastian
On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > is this an RT-only problem? Because mainline should not allow read->read > > locking or read->write locking for reader-writer locks. If this only > > happens on v4.18 and not on v4.19 then something must have fixed it. > > Probably misunderstanding, but I'd say that read->read locking is "the > norm"...? > > If you don't use qrwlock, readers are also "recursive", in part., > > P0 P1 > read_lock(l) > write_lock(l) > read_lock(l) > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > the analysis; IIUC, they are recursive if and only if in_interrupt().). ehm, peterz, is that true? My memory on that is that all readers will block if there is a writer pending. > Andrea Sebastian
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0 P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. With qwrlocks, the readers will normally block if there is a pending writer (to avoid starving the writer), unless in_interrupt() when the readers are allowed to starve a pending writer. TLA+/PlusCal model here: ;) https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla
On Fri, 23 Nov 2018 11:31:31 +0000 Catalin Marinas <catalin.marinas@arm.com> wrote: > With qwrlocks, the readers will normally block if there is a pending > writer (to avoid starving the writer), unless in_interrupt() when the > readers are allowed to starve a pending writer. > > TLA+/PlusCal model here: ;) > > https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla > And the code appears to confirm it too: void queued_read_lock_slowpath(struct qrwlock *lock) { /* * Readers come here when they cannot get the lock without waiting */ if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately * if the writer is just waiting (not holding the lock yet), * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); return; } atomic_sub(_QR_BIAS, &lock->cnts); /* * Put the reader into the wait queue */ arch_spin_lock(&lock->wait_lock); atomic_add(_QR_BIAS, &lock->cnts); -- Steve
On 2018/11/23 17:53, Sebastian Andrzej Siewior wrote: > On 2018-11-22 17:04:19 [+0800], zhe.he@windriver.com wrote: >> From: He Zhe <zhe.he@windriver.com> >> >> kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and >> causes the follow BUG. >> >> BUG: scheduling while atomic: migration/15/132/0x00000002 > … >> Preemption disabled at: >> [<ffffffff8c927c11>] cpu_stopper_thread+0x71/0x100 >> CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 >> Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 >> Call Trace: >> dump_stack+0x4f/0x6a >> ? cpu_stopper_thread+0x71/0x100 >> __schedule_bug.cold.16+0x38/0x55 >> __schedule+0x484/0x6c0 >> schedule+0x3d/0xe0 >> rt_spin_lock_slowlock_locked+0x118/0x2a0 >> rt_spin_lock_slowlock+0x57/0x90 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> ? intel_pmu_cpu_dying+0x67/0x70 >> rt_write_lock+0x2a/0x30 >> find_and_remove_object+0x1e/0x80 >> delete_object_full+0x10/0x20 >> kmemleak_free+0x32/0x50 >> kfree+0x104/0x1f0 >> ? x86_pmu_starting_cpu+0x30/0x30 >> intel_pmu_cpu_dying+0x67/0x70 >> x86_pmu_dying_cpu+0x1a/0x30 >> cpuhp_invoke_callback+0x92/0x700 >> take_cpu_down+0x70/0xa0 >> multi_cpu_stop+0x62/0xc0 >> ? cpu_stop_queue_work+0x130/0x130 >> cpu_stopper_thread+0x79/0x100 >> smpboot_thread_fn+0x20f/0x2d0 >> kthread+0x121/0x140 >> ? sort_range+0x30/0x30 >> ? kthread_park+0x90/0x90 >> ret_from_fork+0x35/0x40 > If this is the only problem? kfree() from a preempt-disabled section > should cause a warning even without kmemleak. Thanks for your review. I just did some tests aginst the latest code. On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak enabied. And none can be reproduced with kmemleak disabled. On latest mainline tree, none can be reproduced no matter kmemleak is enabled or disabled. I don't get why kfree from a preempt-disabled section should cause a warning without kmemleak, since kfree can't sleep. If I understand correctly, the call trace above is caused by trying to schedule after preemption is disabled, which cannot be reached in mainline kernel. So we might need to turn to use raw lock to keep preemption disabled. > >> And on v4.18 stable tree the following call trace, caused by grabbing >> kmemleak_lock again, is also observed. >> >> kernel BUG at kernel/locking/rtmutex.c:1048! >> invalid opcode: 0000 [#1] PREEMPT SMP PTI >> CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 > … >> Call Trace: >> ? preempt_count_add+0x74/0xc0 >> rt_spin_lock_slowlock+0x57/0x90 >> ? __kernel_text_address+0x12/0x40 >> ? __save_stack_trace+0x75/0x100 >> __rt_spin_lock+0x26/0x30 >> __write_rt_lock+0x23/0x1a0 >> rt_write_lock+0x2a/0x30 >> create_object+0x17d/0x2b0 > … > > is this an RT-only problem? Because mainline should not allow read->read > locking or read->write locking for reader-writer locks. If this only > happens on v4.18 and not on v4.19 then something must have fixed it. From what I reached above, this is RT-only and happens on v4.18 and v4.19. The call trace above is caused by grabbing kmemleak_lock and then getting scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve this problem. Thanks, Zhe > > > Sebastian >
On Fri, Nov 23, 2018 at 12:06:11PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-23 12:02:55 [+0100], Andrea Parri wrote: > > > is this an RT-only problem? Because mainline should not allow read->read > > > locking or read->write locking for reader-writer locks. If this only > > > happens on v4.18 and not on v4.19 then something must have fixed it. > > > > Probably misunderstanding, but I'd say that read->read locking is "the > > norm"...? > > > > If you don't use qrwlock, readers are also "recursive", in part., > > > > P0 P1 > > read_lock(l) > > write_lock(l) > > read_lock(l) > > > > won't block P0 on the second read_lock(). (qrwlock somehow complicate > > the analysis; IIUC, they are recursive if and only if in_interrupt().). > > ehm, peterz, is that true? My memory on that is that all readers will > block if there is a writer pending. Since qrwlock is the more strict, all users should use its semantics. Just like we cannot 'rely' on the unfairness of some lock implementations.
On 2018-11-24 22:26:46 [+0800], He Zhe wrote: > On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak > enabied. And none can be reproduced with kmemleak disabled. okay. So it needs attention. > On latest mainline tree, none can be reproduced no matter kmemleak is enabled > or disabled. > > I don't get why kfree from a preempt-disabled section should cause a warning > without kmemleak, since kfree can't sleep. it might. It will acquire a sleeping lock if it has go down to the memory allocator to actually give memory back. > If I understand correctly, the call trace above is caused by trying to schedule > after preemption is disabled, which cannot be reached in mainline kernel. So > we might need to turn to use raw lock to keep preemption disabled. The buddy-allocator runs with spin locks so it is okay on !RT. So you can use kfree() with disabled preemption or disabled interrupts. I don't think that we want to use raw-locks in the buddy-allocator. > >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > > The call trace above is caused by grabbing kmemleak_lock and then getting > scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > this problem. But this is a reader / writer lock. And if I understand the other part of the thread then it needs multiple readers. Couldn't we just get rid of that kfree() or move it somewhere else? I mean if the free() memory on CPU-down and allocate it again CPU-up then we could skip that, rigth? Just allocate it and don't free it because the CPU will likely get up again. > Thanks, > Zhe Sebastian
On 2018/12/1 02:19, Sebastian Andrzej Siewior wrote: > On 2018-11-24 22:26:46 [+0800], He Zhe wrote: >> On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak >> enabied. And none can be reproduced with kmemleak disabled. > okay. So it needs attention. > >> On latest mainline tree, none can be reproduced no matter kmemleak is enabled >> or disabled. >> >> I don't get why kfree from a preempt-disabled section should cause a warning >> without kmemleak, since kfree can't sleep. > it might. It will acquire a sleeping lock if it has go down to the > memory allocator to actually give memory back. Got it. Thanks. > >> If I understand correctly, the call trace above is caused by trying to schedule >> after preemption is disabled, which cannot be reached in mainline kernel. So >> we might need to turn to use raw lock to keep preemption disabled. > The buddy-allocator runs with spin locks so it is okay on !RT. So you > can use kfree() with disabled preemption or disabled interrupts. > I don't think that we want to use raw-locks in the buddy-allocator. For call trace 1: I went through the calling paths inside kfree() and found that there have already been things using raw lock in it as follow. 1) in the path of kfree() itself kfree -> slab_free -> do_slab_free -> __slab_free -> raw_spin_lock_irqsave 2) in the path of CONFIG_DEBUG_OBJECTS_FREE kfree -> slab_free -> slab_free_freelist_hook -> slab_free_hook -> debug_check_no_obj_freed -> __debug_check_no_obj_freed -> raw_spin_lock_irqsave 3) in the path of CONFIG_LOCKDEP kfree -> __free_pages -> free_unref_page -> free_unref_page_prepare -> free_pcp_prepare -> free_pages_prepare -> debug_check_no_locks_freed -> debug_check_no_locks_freed -> raw_local_irq_save Since kmemleak would most likely be used to debug in environments where we would not expect as great performance as without it, and kfree() has raw locks in its main path and other debug function paths, I suppose it wouldn't hurt that we change to raw locks. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. >> >> The call trace above is caused by grabbing kmemleak_lock and then getting >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve >> this problem. > But this is a reader / writer lock. And if I understand the other part > of the thread then it needs multiple readers. For call trace 2: I don't get what "it needs multiple readers" exactly means here. In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled away, and then grabbed again as write lock from another path. It's a write->write locking, compared to the discussion in the other part of the thread. This is essentially because kmemleak hooks on the very low level memory allocation and free operations. After scheduled away, it can easily re-enter itself. We need raw locks to prevent this from happening. > Couldn't we just get rid of that kfree() or move it somewhere else? > I mean if the free() memory on CPU-down and allocate it again CPU-up > then we could skip that, rigth? Just allocate it and don't free it > because the CPU will likely get up again. For call trace 1: I went through the CPU hotplug code and found that the allocation of the problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And the free is done in intel_pmu_cpu_dying. They are handlers triggered by two different perf events. It seems we can hardly form a convincing method that holds the data while CPUs are off and then uses it again. raw locks would be easy and good enough. Thanks, Zhe > >> Thanks, >> Zhe > Sebastian >
On 2018-12-05 21:53:37 [+0800], He Zhe wrote: > For call trace 1: … > Since kmemleak would most likely be used to debug in environments where > we would not expect as great performance as without it, and kfree() has raw locks > in its main path and other debug function paths, I suppose it wouldn't hurt that > we change to raw locks. okay. > >> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. > >> > >> The call trace above is caused by grabbing kmemleak_lock and then getting > >> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve > >> this problem. > > But this is a reader / writer lock. And if I understand the other part > > of the thread then it needs multiple readers. > > For call trace 2: > > I don't get what "it needs multiple readers" exactly means here. > > In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled > away, and then grabbed again as write lock from another path. It's a > write->write locking, compared to the discussion in the other part of the thread. > > This is essentially because kmemleak hooks on the very low level memory > allocation and free operations. After scheduled away, it can easily re-enter itself. > We need raw locks to prevent this from happening. With raw locks you wouldn't have multiple readers at the same time. Maybe you wouldn't have recursion but since you can't have multiple readers you would add lock contention where was none (because you could have two readers at the same time). > > Couldn't we just get rid of that kfree() or move it somewhere else? > > I mean if the free() memory on CPU-down and allocate it again CPU-up > > then we could skip that, rigth? Just allocate it and don't free it > > because the CPU will likely get up again. > > For call trace 1: > > I went through the CPU hotplug code and found that the allocation of the > problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And > the free is done in intel_pmu_cpu_dying. They are handlers triggered by two > different perf events. > > It seems we can hardly form a convincing method that holds the data while > CPUs are off and then uses it again. raw locks would be easy and good enough. Why not allocate the memory in intel_pmu_cpu_prepare() if it is not already there (otherwise skip the allocation) and in intel_pmu_cpu_dying() not free it. It looks easy. > Thanks, > Zhe Sebastian
On 2018/12/6 03:14, Sebastian Andrzej Siewior wrote: > On 2018-12-05 21:53:37 [+0800], He Zhe wrote: >> For call trace 1: > … >> Since kmemleak would most likely be used to debug in environments where >> we would not expect as great performance as without it, and kfree() has raw locks >> in its main path and other debug function paths, I suppose it wouldn't hurt that >> we change to raw locks. > okay. > >>>> >From what I reached above, this is RT-only and happens on v4.18 and v4.19. >>>> >>>> The call trace above is caused by grabbing kmemleak_lock and then getting >>>> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve >>>> this problem. >>> But this is a reader / writer lock. And if I understand the other part >>> of the thread then it needs multiple readers. >> For call trace 2: >> >> I don't get what "it needs multiple readers" exactly means here. >> >> In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled >> away, and then grabbed again as write lock from another path. It's a >> write->write locking, compared to the discussion in the other part of the thread. >> >> This is essentially because kmemleak hooks on the very low level memory >> allocation and free operations. After scheduled away, it can easily re-enter itself. >> We need raw locks to prevent this from happening. > With raw locks you wouldn't have multiple readers at the same time. > Maybe you wouldn't have recursion but since you can't have multiple > readers you would add lock contention where was none (because you could > have two readers at the same time). Sorry for slow reply. OK. I understand your concern finally. At the commit log said, I wanted to use raw rwlock but didn't find the DEFINE helper for it. Thinking it would not be expected to have great performance, I turn to use raw spinlock instead. And yes, this would introduce worse performance. Maybe I miss the reason, but why don't we have rwlock_types_raw.h to define raw rwlock helper for RT? With that, we can cleanly replace kmemleak_lock with a raw rwlock. Or should we just define a raw rwlock using basic type, like arch_rwlock_t, only in kmemleak? > >>> Couldn't we just get rid of that kfree() or move it somewhere else? >>> I mean if the free() memory on CPU-down and allocate it again CPU-up >>> then we could skip that, rigth? Just allocate it and don't free it >>> because the CPU will likely get up again. >> For call trace 1: >> >> I went through the CPU hotplug code and found that the allocation of the >> problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And >> the free is done in intel_pmu_cpu_dying. They are handlers triggered by two >> different perf events. >> >> It seems we can hardly form a convincing method that holds the data while >> CPUs are off and then uses it again. raw locks would be easy and good enough. > Why not allocate the memory in intel_pmu_cpu_prepare() if it is not > already there (otherwise skip the allocation) and in > intel_pmu_cpu_dying() not free it. It looks easy. Thanks for your suggestion. I've sent the change for call trace 1 to mainline mailing list. Hopefully it can be accepted. Zhe > >> Thanks, >> Zhe > Sebastian >
On Tue, Dec 18, 2018 at 06:51:41PM +0800, He Zhe wrote: > On 2018/12/6 03:14, Sebastian Andrzej Siewior wrote: > > With raw locks you wouldn't have multiple readers at the same time. > > Maybe you wouldn't have recursion but since you can't have multiple > > readers you would add lock contention where was none (because you could > > have two readers at the same time). > > OK. I understand your concern finally. At the commit log said, I wanted to use raw > rwlock but didn't find the DEFINE helper for it. Thinking it would not be expected to > have great performance, I turn to use raw spinlock instead. And yes, this would > introduce worse performance. Looking through the kmemleak code, I can't really find significant reader contention. The longest holder of this lock (read) is the scan thread which is also protected by a scan_mutex, so can't run concurrently with another scanner (triggered through debugfs). The other read_lock(&kmemleak_lock) user is find_and_get_object() called from a few places. However, all such places normally follow a create_object() call (kmemleak_alloc() and friends) which already performs a write_lock(&kmemleak_lock), so it needs to wait for the scan thread to release the kmemleak_lock. It may be worth running some performance/latency tests during kmemleak scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, I don't think we'd see any difference with a raw_spin_lock_t. With a bit more thinking (though I'll be off until the new year), we could probably get rid of the kmemleak_lock entirely in scan_block() and make lookup_object() and the related rbtree code in kmemleak RCU-safe.
On Thu, Nov 22, 2018 at 05:04:19PM +0800, zhe.he@windriver.com wrote: > From: He Zhe <zhe.he@windriver.com> > > kmemleak_lock, as a rwlock on RT, can possibly be held in atomic context and > causes the follow BUG. > > BUG: scheduling while atomic: migration/15/132/0x00000002 > Modules linked in: iTCO_wdt iTCO_vendor_support intel_rapl pcc_cpufreq > pnd2_edac intel_powerclamp coretemp crct10dif_pclmul crct10dif_common > aesni_intel matroxfb_base aes_x86_64 matroxfb_g450 matroxfb_accel > crypto_simd matroxfb_DAC1064 cryptd glue_helper g450_pll matroxfb_misc > i2c_ismt i2c_i801 acpi_cpufreq > Preemption disabled at: > [<ffffffff8c927c11>] cpu_stopper_thread+0x71/0x100 > CPU: 15 PID: 132 Comm: migration/15 Not tainted 4.19.0-rt1-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > Call Trace: > dump_stack+0x4f/0x6a > ? cpu_stopper_thread+0x71/0x100 > __schedule_bug.cold.16+0x38/0x55 > __schedule+0x484/0x6c0 > schedule+0x3d/0xe0 > rt_spin_lock_slowlock_locked+0x118/0x2a0 > rt_spin_lock_slowlock+0x57/0x90 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > ? intel_pmu_cpu_dying+0x67/0x70 > rt_write_lock+0x2a/0x30 > find_and_remove_object+0x1e/0x80 > delete_object_full+0x10/0x20 > kmemleak_free+0x32/0x50 > kfree+0x104/0x1f0 > ? x86_pmu_starting_cpu+0x30/0x30 > intel_pmu_cpu_dying+0x67/0x70 > x86_pmu_dying_cpu+0x1a/0x30 > cpuhp_invoke_callback+0x92/0x700 > take_cpu_down+0x70/0xa0 > multi_cpu_stop+0x62/0xc0 > ? cpu_stop_queue_work+0x130/0x130 > cpu_stopper_thread+0x79/0x100 > smpboot_thread_fn+0x20f/0x2d0 > kthread+0x121/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 > > And on v4.18 stable tree the following call trace, caused by grabbing > kmemleak_lock again, is also observed. > > kernel BUG at kernel/locking/rtmutex.c:1048! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 5 PID: 689 Comm: mkfs.ext4 Not tainted 4.18.16-rt9-preempt-rt #1 > Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB1.X64.0015.D62.1708310404 08/31/2017 > RIP: 0010:rt_spin_lock_slowlock_locked+0x277/0x2a0 > Code: e8 5e 64 61 ff e9 bc fe ff ff e8 54 64 61 ff e9 b7 fe ff ff 0f 0b e8 98 57 53 ff e9 43 fe ff ff e8 8e 57 53 ff e9 74 ff ff ff <0f> 0b 0f 0b 0f 0b 48 8b 43 10 48 85 c0 74 06 48 3b 58 38 75 0b 49 > RSP: 0018:ffff936846d4f3b0 EFLAGS: 00010046 > RAX: ffff8e3680361e00 RBX: ffffffff83a8b240 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: ffff8e3680361e00 RDI: ffffffff83a8b258 > RBP: ffff936846d4f3e8 R08: ffff8e3680361e01 R09: ffffffff82adfdf0 > R10: ffffffff827ede18 R11: 0000000000000000 R12: ffff936846d4f3f8 > R13: ffff8e3680361e00 R14: ffff936846d4f3f8 R15: 0000000000000246 > FS: 00007fc8b6bfd780(0000) GS:ffff8e369f340000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055fb5659e000 CR3: 00000007fdd14000 CR4: 00000000003406e0 > Call Trace: > ? preempt_count_add+0x74/0xc0 > rt_spin_lock_slowlock+0x57/0x90 > ? __kernel_text_address+0x12/0x40 > ? __save_stack_trace+0x75/0x100 > __rt_spin_lock+0x26/0x30 > __write_rt_lock+0x23/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 > kmemleak_alloc+0x34/0x50 > kmem_cache_alloc+0x146/0x220 > ? mempool_alloc_slab+0x15/0x20 > mempool_alloc_slab+0x15/0x20 > mempool_alloc+0x65/0x170 > sg_pool_alloc+0x21/0x60 > __sg_alloc_table+0x101/0x160 > ? sg_free_table_chained+0x30/0x30 > sg_alloc_table_chained+0x8b/0xb0 > scsi_init_sgtable+0x31/0x90 > scsi_init_io+0x44/0x130 > sd_setup_write_same16_cmnd+0xef/0x150 > sd_init_command+0x6bf/0xaa0 > ? cgroup_base_stat_cputime_account_end.isra.0+0x26/0x60 > ? elv_rb_del+0x2a/0x40 > scsi_setup_cmnd+0x8e/0x140 > scsi_prep_fn+0x5d/0x140 > blk_peek_request+0xda/0x2f0 > scsi_request_fn+0x33/0x550 > ? cfq_rb_erase+0x23/0x40 > __blk_run_queue+0x43/0x60 > cfq_insert_request+0x2f3/0x5d0 > __elv_add_request+0x160/0x290 > blk_flush_plug_list+0x204/0x230 > schedule+0x87/0xe0 > __write_rt_lock+0x18b/0x1a0 > rt_write_lock+0x2a/0x30 > create_object+0x17d/0x2b0 > kmemleak_alloc+0x34/0x50 > __kmalloc_node+0x1cd/0x340 > alloc_request_size+0x30/0x70 > mempool_alloc+0x65/0x170 > ? ioc_lookup_icq+0x54/0x70 > get_request+0x4e3/0x8d0 > ? wait_woken+0x80/0x80 > blk_queue_bio+0x153/0x470 > generic_make_request+0x1dc/0x3f0 > submit_bio+0x49/0x140 > ? next_bio+0x38/0x40 > submit_bio_wait+0x59/0x90 > blkdev_issue_discard+0x7a/0xd0 > ? _raw_spin_unlock_irqrestore+0x18/0x50 > blk_ioctl_discard+0xc7/0x110 > blkdev_ioctl+0x57e/0x960 > ? __wake_up+0x13/0x20 > block_ioctl+0x3d/0x50 > do_vfs_ioctl+0xa8/0x610 > ? vfs_write+0x166/0x1b0 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x1a/0x20 > do_syscall_64+0x4d/0xf0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > kmemleak is an error detecting feature. We would not expect as good performance > as without it. As there is no raw rwlock defining helpers, we turn kmemleak_lock > to a raw spinlock. > > Signed-off-by: He Zhe <zhe.he@windriver.com> > Cc: catalin.marinas@arm.com > Cc: bigeasy@linutronix.de > Cc: tglx@linutronix.de > Cc: rostedt@goodmis.org As I replied already, I don't think this patch would increase the kmemleak latency (or performance), although I haven't actually tested it. FWIW: Acked-by: Catalin Marinas <catalin.marinas@arm.com>
On 2018-12-18 15:07:45 [+0000], Catalin Marinas wrote: … > It may be worth running some performance/latency tests during kmemleak > scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, > I don't think we'd see any difference with a raw_spin_lock_t. > > With a bit more thinking (though I'll be off until the new year), we > could probably get rid of the kmemleak_lock entirely in scan_block() and > make lookup_object() and the related rbtree code in kmemleak RCU-safe. Okay. So let me apply that patch into my RT tree with your ack (from the other email). And then I hope that it either shows up upstream or gets replaced with RCU in the ende :) Thanks. Sebastian
On 2018/12/19 23:30, Sebastian Andrzej Siewior wrote: > On 2018-12-18 15:07:45 [+0000], Catalin Marinas wrote: > … >> It may be worth running some performance/latency tests during kmemleak >> scanning (echo scan > /sys/kernel/debug/kmemleak) but at a quick look, >> I don't think we'd see any difference with a raw_spin_lock_t. >> >> With a bit more thinking (though I'll be off until the new year), we >> could probably get rid of the kmemleak_lock entirely in scan_block() and >> make lookup_object() and the related rbtree code in kmemleak RCU-safe. > Okay. So let me apply that patch into my RT tree with your ack (from the > other email). And then I hope that it either shows up upstream or gets > replaced with RCU in the ende :) I'd like to do the upstreaming or replacing. Thanks. Zhe > > Thanks. > > Sebastian >
diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 17dd883..b68a3d0 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -26,7 +26,7 @@ * * The following locks and mutexes are used by kmemleak: * - * - kmemleak_lock (rwlock): protects the object_list modifications and + * - kmemleak_lock (raw spinlock): protects the object_list modifications and * accesses to the object_tree_root. The object_list is the main list * holding the metadata (struct kmemleak_object) for the allocated memory * blocks. The object_tree_root is a red black tree used to look-up @@ -197,7 +197,7 @@ static LIST_HEAD(gray_list); /* search tree for object boundaries */ static struct rb_root object_tree_root = RB_ROOT; /* rw_lock protecting the access to object_list and object_tree_root */ -static DEFINE_RWLOCK(kmemleak_lock); +static DEFINE_RAW_SPINLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; @@ -491,9 +491,9 @@ static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias) struct kmemleak_object *object; rcu_read_lock(); - read_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); object = lookup_object(ptr, alias); - read_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); /* check whether the object is still available */ if (object && !get_object(object)) @@ -513,13 +513,13 @@ static struct kmemleak_object *find_and_remove_object(unsigned long ptr, int ali unsigned long flags; struct kmemleak_object *object; - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); object = lookup_object(ptr, alias); if (object) { rb_erase(&object->rb_node, &object_tree_root); list_del_rcu(&object->object_list); } - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); return object; } @@ -593,7 +593,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, /* kernel backtrace */ object->trace_len = __save_stack_trace(object->trace); - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); min_addr = min(min_addr, ptr); max_addr = max(max_addr, ptr + size); @@ -624,7 +624,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, list_add_tail_rcu(&object->object_list, &object_list); out: - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); return object; } @@ -1310,7 +1310,7 @@ static void scan_block(void *_start, void *_end, unsigned long *end = _end - (BYTES_PER_POINTER - 1); unsigned long flags; - read_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); for (ptr = start; ptr < end; ptr++) { struct kmemleak_object *object; unsigned long pointer; @@ -1367,7 +1367,7 @@ static void scan_block(void *_start, void *_end, spin_unlock(&object->lock); } } - read_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); } /*