Message ID | 20241217142032.55793-1-acarmina@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/kmemleak: Fix sleeping function called from invalid context at print message | expand |
On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote: > Address a bug in the kernel that triggers a "sleeping function called from > invalid context" warning when /sys/kernel/debug/kmemleak is printed under > specific conditions: > - CONFIG_PREEMPT_RT=y > - Set SELinux as the LSM for the system > - Set kptr_restrict to 1 > - kmemleak buffer contains at least one item > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat -rt is a bit annoying this way. Things which we expect to work OK are no longer doing so. > preempt_count: 1, expected: 0 > RCU nest depth: 2, expected: 2 > 6 locks held by cat/136: > #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 > #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 > #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 > #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 > #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 > irq event stamp: 136660 > hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 > hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 > softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > Preemption disabled at: > [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 > CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 > Tainted: [E]=UNSIGNED_MODULE > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0xa0/0x128 > show_stack+0x1c/0x30 > dump_stack_lvl+0xe8/0x198 > dump_stack+0x18/0x20 > rt_spin_lock+0x8c/0x1a8 > avc_perm_nonode+0xa0/0x150 > cred_has_capability.isra.0+0x118/0x218 > selinux_capable+0x50/0x80 > security_capable+0x7c/0xd0 > has_ns_capability_noaudit+0x94/0x1b0 > has_capability_noaudit+0x20/0x30 > restricted_pointer+0x21c/0x4b0 > pointer+0x298/0x760 > vsnprintf+0x330/0xf70 > seq_printf+0x178/0x218 > print_unreferenced+0x1a4/0x2d0 > kmemleak_seq_show+0xd0/0x1e0 > seq_read_iter+0x354/0xe30 > seq_read+0x250/0x378 > full_proxy_read+0xd8/0x148 > vfs_read+0x190/0x918 > ksys_read+0xf0/0x1e0 > __arm64_sys_read+0x70/0xa8 > invoke_syscall.constprop.0+0xd4/0x1d8 > el0_svc+0x50/0x158 > el0t_64_sync+0x17c/0x180 > > %pS and %pK, in the same back trace line, are redundant, and %pS can void > %pK service in certain contexts. > > %pS alone already provides the necessary information, and if it cannot > resolve the symbol, it falls back to printing the raw address voiding > the original intent behind the %pK. > > Additionally, %pK requires a privilege check CAP_SYSLOG enforced through > the LSM, which can trigger a "sleeping function called from invalid > context" warning under RT_PREEMPT kernels when the check occurs in an > atomic context. This issue may also affect other LSMs. > > This change avoids the unnecessary privilege check and resolves the > sleeping function warning without any loss of information. > > Signed-off-by: Alessandro Carminati <acarmina@redhat.com> I'm adding Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace") Cc: <stable@vger.kernel.org> > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq, > > for (i = 0; i < nr_entries; i++) { > void *ptr = (void *)entries[i]; > - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr); > + warn_or_seq_printf(seq, " %pS\n", ptr); > } > } Before 3a6f33d86baa we were still printing the address, with plain old %p. Should we restore that, or is %p always useless?
Hello Andrew, Thanks for the feedback. On Tue, Dec 17, 2024 at 6:55 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote: > > > Address a bug in the kernel that triggers a "sleeping function called from > > invalid context" warning when /sys/kernel/debug/kmemleak is printed under > > specific conditions: > > - CONFIG_PREEMPT_RT=y > > - Set SELinux as the LSM for the system > > - Set kptr_restrict to 1 > > - kmemleak buffer contains at least one item > > > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat > > -rt is a bit annoying this way. Things which we expect to work OK are > no longer doing so. > > > preempt_count: 1, expected: 0 > > RCU nest depth: 2, expected: 2 > > 6 locks held by cat/136: > > #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 > > #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 > > #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 > > #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 > > #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 > > irq event stamp: 136660 > > hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 > > hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 > > softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > Preemption disabled at: > > [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 > > CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 > > Tainted: [E]=UNSIGNED_MODULE > > Hardware name: linux,dummy-virt (DT) > > Call trace: > > dump_backtrace+0xa0/0x128 > > show_stack+0x1c/0x30 > > dump_stack_lvl+0xe8/0x198 > > dump_stack+0x18/0x20 > > rt_spin_lock+0x8c/0x1a8 > > avc_perm_nonode+0xa0/0x150 > > cred_has_capability.isra.0+0x118/0x218 > > selinux_capable+0x50/0x80 > > security_capable+0x7c/0xd0 > > has_ns_capability_noaudit+0x94/0x1b0 > > has_capability_noaudit+0x20/0x30 > > restricted_pointer+0x21c/0x4b0 > > pointer+0x298/0x760 > > vsnprintf+0x330/0xf70 > > seq_printf+0x178/0x218 > > print_unreferenced+0x1a4/0x2d0 > > kmemleak_seq_show+0xd0/0x1e0 > > seq_read_iter+0x354/0xe30 > > seq_read+0x250/0x378 > > full_proxy_read+0xd8/0x148 > > vfs_read+0x190/0x918 > > ksys_read+0xf0/0x1e0 > > __arm64_sys_read+0x70/0xa8 > > invoke_syscall.constprop.0+0xd4/0x1d8 > > el0_svc+0x50/0x158 > > el0t_64_sync+0x17c/0x180 > > > > %pS and %pK, in the same back trace line, are redundant, and %pS can void > > %pK service in certain contexts. > > > > %pS alone already provides the necessary information, and if it cannot > > resolve the symbol, it falls back to printing the raw address voiding > > the original intent behind the %pK. > > > > Additionally, %pK requires a privilege check CAP_SYSLOG enforced through > > the LSM, which can trigger a "sleeping function called from invalid > > context" warning under RT_PREEMPT kernels when the check occurs in an > > atomic context. This issue may also affect other LSMs. > > > > This change avoids the unnecessary privilege check and resolves the > > sleeping function warning without any loss of information. > > > > Signed-off-by: Alessandro Carminati <acarmina@redhat.com> > > I'm adding > > Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace") > Cc: <stable@vger.kernel.org> > > > --- a/mm/kmemleak.c > > +++ b/mm/kmemleak.c > > @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq, > > > > for (i = 0; i < nr_entries; i++) { > > void *ptr = (void *)entries[i]; > > - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr); > > + warn_or_seq_printf(seq, " %pS\n", ptr); > > } > > } > > Before 3a6f33d86baa we were still printing the address, with plain old > %p. Should we restore that, or is %p always useless? > The intent of 3a6f33d86baa8 is to make pointers available for debugging purposes. My argument for this change is that %pS provides already sufficient information for debugging purposes. We essentially have two scenarios concerning %pS: 1. When the symbol can be resolved by kallsyms: In this case, a symbol+offset style pointer is emitted, which I find more useful than just the address. 2. When the symbol cannot be resolved by kallsyms: Here, the raw pointer is emitted. This undermines the original intent of %pK, as both the masked address and the plain address are exposed. If we want to be picky, there's an additional factor to consider: KASLR. %pS emits its output accounting for randomization, but in most cases, the randomized offset information is lost. The exception is when the symbol cannot be resolved by kallsyms, though this should be rare. It's still worth mentioning. Back to your question: %p emits an hashed address that is not reversible by design. Therefore, in my opinion, it is not particularly useful in a debug message.
On 2024-12-18 09:38:05 [+0100], Alessandro Carminati wrote: > On Tue, Dec 17, 2024 at 6:55 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote: > > > > > Address a bug in the kernel that triggers a "sleeping function called from > > > invalid context" warning when /sys/kernel/debug/kmemleak is printed under > > > specific conditions: > > > - CONFIG_PREEMPT_RT=y > > > - Set SELinux as the LSM for the system > > > - Set kptr_restrict to 1 > > > - kmemleak buffer contains at least one item > > > > > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat > > > > -rt is a bit annoying this way. Things which we expect to work OK are > > no longer doing so. Yeah. But lockdep should have complained here earlier. The RT just enforces that part. > > > --- a/mm/kmemleak.c > > > +++ b/mm/kmemleak.c > > > @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq, > > > > > > for (i = 0; i < nr_entries; i++) { > > > void *ptr = (void *)entries[i]; > > > - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr); > > > + warn_or_seq_printf(seq, " %pS\n", ptr); > > > } > > > } > > > > Before 3a6f33d86baa we were still printing the address, with plain old > > %p. Should we restore that, or is %p always useless? > > > > The intent of 3a6f33d86baa8 is to make pointers available for debugging > purposes. > My argument for this change is that %pS provides already sufficient > information for debugging purposes. > > We essentially have two scenarios concerning %pS: > 1. When the symbol can be resolved by kallsyms: In this case, a > symbol+offset style pointer is emitted, which I find more useful than > just the address. indeed. With the relocation the information is usually hard to make sense. > 2. When the symbol cannot be resolved by kallsyms: Here, the raw pointer > is emitted. This undermines the original intent of %pK, as both the > masked address and the plain address are exposed. > > If we want to be picky, there's an additional factor to consider: KASLR. > %pS emits its output accounting for randomization, but in most cases, the > randomized offset information is lost. > The exception is when the symbol cannot be resolved by kallsyms, though > this should be rare. It's still worth mentioning. > > Back to your question: %p emits an hashed address that is not > reversible by design. > Therefore, in my opinion, it is not particularly useful in a debug > message. There is still %px if you want your debug message to show the raw pointer regardless of pointer hashing. That is something that you could disable. The %pK format is something that was meant to be used files like proc or sysfs. With the restrictions you already have on dmesg it does not seem necessary to have yet another check. Thank you the patch. Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Sebastian
On Tue, Dec 17, 2024 at 02:20:33PM +0000, Alessandro Carminati wrote: > Address a bug in the kernel that triggers a "sleeping function called from > invalid context" warning when /sys/kernel/debug/kmemleak is printed under > specific conditions: > - CONFIG_PREEMPT_RT=y > - Set SELinux as the LSM for the system > - Set kptr_restrict to 1 > - kmemleak buffer contains at least one item > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat > preempt_count: 1, expected: 0 > RCU nest depth: 2, expected: 2 > 6 locks held by cat/136: > #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 > #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 > #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 > #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 > #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 > irq event stamp: 136660 > hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 > hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 > softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > Preemption disabled at: > [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 > CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 > Tainted: [E]=UNSIGNED_MODULE > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0xa0/0x128 > show_stack+0x1c/0x30 > dump_stack_lvl+0xe8/0x198 > dump_stack+0x18/0x20 > rt_spin_lock+0x8c/0x1a8 > avc_perm_nonode+0xa0/0x150 > cred_has_capability.isra.0+0x118/0x218 > selinux_capable+0x50/0x80 > security_capable+0x7c/0xd0 > has_ns_capability_noaudit+0x94/0x1b0 > has_capability_noaudit+0x20/0x30 > restricted_pointer+0x21c/0x4b0 > pointer+0x298/0x760 > vsnprintf+0x330/0xf70 > seq_printf+0x178/0x218 > print_unreferenced+0x1a4/0x2d0 > kmemleak_seq_show+0xd0/0x1e0 > seq_read_iter+0x354/0xe30 > seq_read+0x250/0x378 > full_proxy_read+0xd8/0x148 > vfs_read+0x190/0x918 > ksys_read+0xf0/0x1e0 > __arm64_sys_read+0x70/0xa8 > invoke_syscall.constprop.0+0xd4/0x1d8 > el0_svc+0x50/0x158 > el0t_64_sync+0x17c/0x180 > > %pS and %pK, in the same back trace line, are redundant, and %pS can void > %pK service in certain contexts. > > %pS alone already provides the necessary information, and if it cannot > resolve the symbol, it falls back to printing the raw address voiding > the original intent behind the %pK. > > Additionally, %pK requires a privilege check CAP_SYSLOG enforced through > the LSM, which can trigger a "sleeping function called from invalid > context" warning under RT_PREEMPT kernels when the check occurs in an > atomic context. This issue may also affect other LSMs. > > This change avoids the unnecessary privilege check and resolves the > sleeping function warning without any loss of information. > > Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 2a945c07ae99..737af23f4f4e 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq, for (i = 0; i < nr_entries; i++) { void *ptr = (void *)entries[i]; - warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr); + warn_or_seq_printf(seq, " %pS\n", ptr); } }
Address a bug in the kernel that triggers a "sleeping function called from invalid context" warning when /sys/kernel/debug/kmemleak is printed under specific conditions: - CONFIG_PREEMPT_RT=y - Set SELinux as the LSM for the system - Set kptr_restrict to 1 - kmemleak buffer contains at least one item BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 6 locks held by cat/136: #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 irq event stamp: 136660 hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 Tainted: [E]=UNSIGNED_MODULE Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0xa0/0x128 show_stack+0x1c/0x30 dump_stack_lvl+0xe8/0x198 dump_stack+0x18/0x20 rt_spin_lock+0x8c/0x1a8 avc_perm_nonode+0xa0/0x150 cred_has_capability.isra.0+0x118/0x218 selinux_capable+0x50/0x80 security_capable+0x7c/0xd0 has_ns_capability_noaudit+0x94/0x1b0 has_capability_noaudit+0x20/0x30 restricted_pointer+0x21c/0x4b0 pointer+0x298/0x760 vsnprintf+0x330/0xf70 seq_printf+0x178/0x218 print_unreferenced+0x1a4/0x2d0 kmemleak_seq_show+0xd0/0x1e0 seq_read_iter+0x354/0xe30 seq_read+0x250/0x378 full_proxy_read+0xd8/0x148 vfs_read+0x190/0x918 ksys_read+0xf0/0x1e0 __arm64_sys_read+0x70/0xa8 invoke_syscall.constprop.0+0xd4/0x1d8 el0_svc+0x50/0x158 el0t_64_sync+0x17c/0x180 %pS and %pK, in the same back trace line, are redundant, and %pS can void %pK service in certain contexts. %pS alone already provides the necessary information, and if it cannot resolve the symbol, it falls back to printing the raw address voiding the original intent behind the %pK. Additionally, %pK requires a privilege check CAP_SYSLOG enforced through the LSM, which can trigger a "sleeping function called from invalid context" warning under RT_PREEMPT kernels when the check occurs in an atomic context. This issue may also affect other LSMs. This change avoids the unnecessary privilege check and resolves the sleeping function warning without any loss of information. Signed-off-by: Alessandro Carminati <acarmina@redhat.com> --- mm/kmemleak.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)