diff mbox series

[v2] mm/kmemleak: Fix sleeping function called from invalid context at print message

Message ID 20241217142032.55793-1-acarmina@redhat.com (mailing list archive)
State New
Headers show
Series [v2] mm/kmemleak: Fix sleeping function called from invalid context at print message | expand

Commit Message

Alessandro Carminati Dec. 17, 2024, 2:20 p.m. UTC
Address a bug in the kernel that triggers a "sleeping function called from
invalid context" warning when /sys/kernel/debug/kmemleak is printed under
specific conditions:
- CONFIG_PREEMPT_RT=y
- Set SELinux as the LSM for the system
- Set kptr_restrict to 1
- kmemleak buffer contains at least one item

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 2
6 locks held by cat/136:
 #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
 #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
 #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
 #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
 #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
irq event stamp: 136660
hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
Tainted: [E]=UNSIGNED_MODULE
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0xa0/0x128
 show_stack+0x1c/0x30
 dump_stack_lvl+0xe8/0x198
 dump_stack+0x18/0x20
 rt_spin_lock+0x8c/0x1a8
 avc_perm_nonode+0xa0/0x150
 cred_has_capability.isra.0+0x118/0x218
 selinux_capable+0x50/0x80
 security_capable+0x7c/0xd0
 has_ns_capability_noaudit+0x94/0x1b0
 has_capability_noaudit+0x20/0x30
 restricted_pointer+0x21c/0x4b0
 pointer+0x298/0x760
 vsnprintf+0x330/0xf70
 seq_printf+0x178/0x218
 print_unreferenced+0x1a4/0x2d0
 kmemleak_seq_show+0xd0/0x1e0
 seq_read_iter+0x354/0xe30
 seq_read+0x250/0x378
 full_proxy_read+0xd8/0x148
 vfs_read+0x190/0x918
 ksys_read+0xf0/0x1e0
 __arm64_sys_read+0x70/0xa8
 invoke_syscall.constprop.0+0xd4/0x1d8
 el0_svc+0x50/0x158
 el0t_64_sync+0x17c/0x180

%pS and %pK, in the same back trace line, are redundant, and %pS can void
%pK service in certain contexts.

%pS alone already provides the necessary information, and if it cannot
resolve the symbol, it falls back to printing the raw address voiding
the original intent behind the %pK.

Additionally, %pK requires a privilege check CAP_SYSLOG enforced through
the LSM, which can trigger a "sleeping function called from invalid
context" warning under RT_PREEMPT kernels when the check occurs in an
atomic context. This issue may also affect other LSMs.

This change avoids the unnecessary privilege check and resolves the
sleeping function warning without any loss of information.

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
---
 mm/kmemleak.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Morton Dec. 17, 2024, 5:55 p.m. UTC | #1
On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote:

> Address a bug in the kernel that triggers a "sleeping function called from
> invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> specific conditions:
> - CONFIG_PREEMPT_RT=y
> - Set SELinux as the LSM for the system
> - Set kptr_restrict to 1
> - kmemleak buffer contains at least one item
> 
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat

-rt is a bit annoying this way.  Things which we expect to work OK are
no longer doing so.

> preempt_count: 1, expected: 0
> RCU nest depth: 2, expected: 2
> 6 locks held by cat/136:
>  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
>  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
>  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
>  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
>  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
> irq event stamp: 136660
> hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
> hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
> softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> Preemption disabled at:
> [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0xa0/0x128
>  show_stack+0x1c/0x30
>  dump_stack_lvl+0xe8/0x198
>  dump_stack+0x18/0x20
>  rt_spin_lock+0x8c/0x1a8
>  avc_perm_nonode+0xa0/0x150
>  cred_has_capability.isra.0+0x118/0x218
>  selinux_capable+0x50/0x80
>  security_capable+0x7c/0xd0
>  has_ns_capability_noaudit+0x94/0x1b0
>  has_capability_noaudit+0x20/0x30
>  restricted_pointer+0x21c/0x4b0
>  pointer+0x298/0x760
>  vsnprintf+0x330/0xf70
>  seq_printf+0x178/0x218
>  print_unreferenced+0x1a4/0x2d0
>  kmemleak_seq_show+0xd0/0x1e0
>  seq_read_iter+0x354/0xe30
>  seq_read+0x250/0x378
>  full_proxy_read+0xd8/0x148
>  vfs_read+0x190/0x918
>  ksys_read+0xf0/0x1e0
>  __arm64_sys_read+0x70/0xa8
>  invoke_syscall.constprop.0+0xd4/0x1d8
>  el0_svc+0x50/0x158
>  el0t_64_sync+0x17c/0x180
> 
> %pS and %pK, in the same back trace line, are redundant, and %pS can void
> %pK service in certain contexts.
> 
> %pS alone already provides the necessary information, and if it cannot
> resolve the symbol, it falls back to printing the raw address voiding
> the original intent behind the %pK.
> 
> Additionally, %pK requires a privilege check CAP_SYSLOG enforced through
> the LSM, which can trigger a "sleeping function called from invalid
> context" warning under RT_PREEMPT kernels when the check occurs in an
> atomic context. This issue may also affect other LSMs.
> 
> This change avoids the unnecessary privilege check and resolves the
> sleeping function warning without any loss of information.
> 
> Signed-off-by: Alessandro Carminati <acarmina@redhat.com>

I'm adding

Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace")
Cc: <stable@vger.kernel.org>

> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq,
>  
>  	for (i = 0; i < nr_entries; i++) {
>  		void *ptr = (void *)entries[i];
> -		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> +		warn_or_seq_printf(seq, "    %pS\n", ptr);
>  	}
>  }

Before 3a6f33d86baa we were still printing the address, with plain old
%p.  Should we restore that, or is %p always useless?
Alessandro Carminati Dec. 18, 2024, 8:38 a.m. UTC | #2
Hello Andrew,
Thanks for the feedback.

On Tue, Dec 17, 2024 at 6:55 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote:
>
> > Address a bug in the kernel that triggers a "sleeping function called from
> > invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> > specific conditions:
> > - CONFIG_PREEMPT_RT=y
> > - Set SELinux as the LSM for the system
> > - Set kptr_restrict to 1
> > - kmemleak buffer contains at least one item
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
>
> -rt is a bit annoying this way.  Things which we expect to work OK are
> no longer doing so.
>
> > preempt_count: 1, expected: 0
> > RCU nest depth: 2, expected: 2
> > 6 locks held by cat/136:
> >  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
> >  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
> >  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
> >  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
> >  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
> > irq event stamp: 136660
> > hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
> > hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
> > softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
> > softirqs last disabled at (0): [<0000000000000000>] 0x0
> > Preemption disabled at:
> > [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> > CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
> > Tainted: [E]=UNSIGNED_MODULE
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace+0xa0/0x128
> >  show_stack+0x1c/0x30
> >  dump_stack_lvl+0xe8/0x198
> >  dump_stack+0x18/0x20
> >  rt_spin_lock+0x8c/0x1a8
> >  avc_perm_nonode+0xa0/0x150
> >  cred_has_capability.isra.0+0x118/0x218
> >  selinux_capable+0x50/0x80
> >  security_capable+0x7c/0xd0
> >  has_ns_capability_noaudit+0x94/0x1b0
> >  has_capability_noaudit+0x20/0x30
> >  restricted_pointer+0x21c/0x4b0
> >  pointer+0x298/0x760
> >  vsnprintf+0x330/0xf70
> >  seq_printf+0x178/0x218
> >  print_unreferenced+0x1a4/0x2d0
> >  kmemleak_seq_show+0xd0/0x1e0
> >  seq_read_iter+0x354/0xe30
> >  seq_read+0x250/0x378
> >  full_proxy_read+0xd8/0x148
> >  vfs_read+0x190/0x918
> >  ksys_read+0xf0/0x1e0
> >  __arm64_sys_read+0x70/0xa8
> >  invoke_syscall.constprop.0+0xd4/0x1d8
> >  el0_svc+0x50/0x158
> >  el0t_64_sync+0x17c/0x180
> >
> > %pS and %pK, in the same back trace line, are redundant, and %pS can void
> > %pK service in certain contexts.
> >
> > %pS alone already provides the necessary information, and if it cannot
> > resolve the symbol, it falls back to printing the raw address voiding
> > the original intent behind the %pK.
> >
> > Additionally, %pK requires a privilege check CAP_SYSLOG enforced through
> > the LSM, which can trigger a "sleeping function called from invalid
> > context" warning under RT_PREEMPT kernels when the check occurs in an
> > atomic context. This issue may also affect other LSMs.
> >
> > This change avoids the unnecessary privilege check and resolves the
> > sleeping function warning without any loss of information.
> >
> > Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
>
> I'm adding
>
> Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace")
> Cc: <stable@vger.kernel.org>
>
> > --- a/mm/kmemleak.c
> > +++ b/mm/kmemleak.c
> > @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq,
> >
> >       for (i = 0; i < nr_entries; i++) {
> >               void *ptr = (void *)entries[i];
> > -             warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > +             warn_or_seq_printf(seq, "    %pS\n", ptr);
> >       }
> >  }
>
> Before 3a6f33d86baa we were still printing the address, with plain old
> %p.  Should we restore that, or is %p always useless?
>

The intent of 3a6f33d86baa8 is to make pointers available for debugging
purposes.
My argument for this change is that %pS provides already sufficient
information for debugging purposes.

We essentially have two scenarios concerning %pS:
1. When the symbol can be resolved by kallsyms: In this case, a
   symbol+offset style pointer is emitted, which I find more useful than
   just the address.
2. When the symbol cannot be resolved by kallsyms: Here, the raw pointer
   is emitted. This undermines the original intent of %pK, as both the
   masked address and the plain address are exposed.

If we want to be picky, there's an additional factor to consider: KASLR.
%pS emits its output accounting for randomization, but in most cases, the
randomized offset information is lost.
The exception is when the symbol cannot be resolved by kallsyms, though
this should be rare. It's still worth mentioning.

Back to your question: %p emits an hashed address that is not
reversible by design.
Therefore, in my opinion, it is not particularly useful in a debug
message.
Sebastian Andrzej Siewior Dec. 18, 2024, 4:17 p.m. UTC | #3
On 2024-12-18 09:38:05 [+0100], Alessandro Carminati wrote:
> On Tue, Dec 17, 2024 at 6:55 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 17 Dec 2024 14:20:33 +0000 Alessandro Carminati <acarmina@redhat.com> wrote:
> >
> > > Address a bug in the kernel that triggers a "sleeping function called from
> > > invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> > > specific conditions:
> > > - CONFIG_PREEMPT_RT=y
> > > - Set SELinux as the LSM for the system
> > > - Set kptr_restrict to 1
> > > - kmemleak buffer contains at least one item
> > >
> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
> >
> > -rt is a bit annoying this way.  Things which we expect to work OK are
> > no longer doing so.

Yeah. But lockdep should have complained here earlier. The RT just
enforces that part.

> > > --- a/mm/kmemleak.c
> > > +++ b/mm/kmemleak.c
> > > @@ -373,7 +373,7 @@ static void print_unreferenced(struct seq_file *seq,
> > >
> > >       for (i = 0; i < nr_entries; i++) {
> > >               void *ptr = (void *)entries[i];
> > > -             warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > > +             warn_or_seq_printf(seq, "    %pS\n", ptr);
> > >       }
> > >  }
> >
> > Before 3a6f33d86baa we were still printing the address, with plain old
> > %p.  Should we restore that, or is %p always useless?
> >
> 
> The intent of 3a6f33d86baa8 is to make pointers available for debugging
> purposes.
> My argument for this change is that %pS provides already sufficient
> information for debugging purposes.
> 
> We essentially have two scenarios concerning %pS:
> 1. When the symbol can be resolved by kallsyms: In this case, a
>    symbol+offset style pointer is emitted, which I find more useful than
>    just the address.

indeed. With the relocation the information is usually hard to make
sense.

> 2. When the symbol cannot be resolved by kallsyms: Here, the raw pointer
>    is emitted. This undermines the original intent of %pK, as both the
>    masked address and the plain address are exposed.
> 
> If we want to be picky, there's an additional factor to consider: KASLR.
> %pS emits its output accounting for randomization, but in most cases, the
> randomized offset information is lost.
> The exception is when the symbol cannot be resolved by kallsyms, though
> this should be rare. It's still worth mentioning.
> 
> Back to your question: %p emits an hashed address that is not
> reversible by design.
> Therefore, in my opinion, it is not particularly useful in a debug
> message.

There is still %px if you want your debug message to show the raw
pointer regardless of pointer hashing. That is something that you could
disable. The %pK format is something that was meant to be used files
like proc or sysfs. With the restrictions you already have on dmesg it
does not seem necessary to have yet another check.

Thank you the patch.

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian
Catalin Marinas Dec. 19, 2024, 4:55 p.m. UTC | #4
On Tue, Dec 17, 2024 at 02:20:33PM +0000, Alessandro Carminati wrote:
> Address a bug in the kernel that triggers a "sleeping function called from
> invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> specific conditions:
> - CONFIG_PREEMPT_RT=y
> - Set SELinux as the LSM for the system
> - Set kptr_restrict to 1
> - kmemleak buffer contains at least one item
> 
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
> preempt_count: 1, expected: 0
> RCU nest depth: 2, expected: 2
> 6 locks held by cat/136:
>  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
>  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
>  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
>  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
>  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
> irq event stamp: 136660
> hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
> hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
> softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> Preemption disabled at:
> [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0xa0/0x128
>  show_stack+0x1c/0x30
>  dump_stack_lvl+0xe8/0x198
>  dump_stack+0x18/0x20
>  rt_spin_lock+0x8c/0x1a8
>  avc_perm_nonode+0xa0/0x150
>  cred_has_capability.isra.0+0x118/0x218
>  selinux_capable+0x50/0x80
>  security_capable+0x7c/0xd0
>  has_ns_capability_noaudit+0x94/0x1b0
>  has_capability_noaudit+0x20/0x30
>  restricted_pointer+0x21c/0x4b0
>  pointer+0x298/0x760
>  vsnprintf+0x330/0xf70
>  seq_printf+0x178/0x218
>  print_unreferenced+0x1a4/0x2d0
>  kmemleak_seq_show+0xd0/0x1e0
>  seq_read_iter+0x354/0xe30
>  seq_read+0x250/0x378
>  full_proxy_read+0xd8/0x148
>  vfs_read+0x190/0x918
>  ksys_read+0xf0/0x1e0
>  __arm64_sys_read+0x70/0xa8
>  invoke_syscall.constprop.0+0xd4/0x1d8
>  el0_svc+0x50/0x158
>  el0t_64_sync+0x17c/0x180
> 
> %pS and %pK, in the same back trace line, are redundant, and %pS can void
> %pK service in certain contexts.
> 
> %pS alone already provides the necessary information, and if it cannot
> resolve the symbol, it falls back to printing the raw address voiding
> the original intent behind the %pK.
> 
> Additionally, %pK requires a privilege check CAP_SYSLOG enforced through
> the LSM, which can trigger a "sleeping function called from invalid
> context" warning under RT_PREEMPT kernels when the check occurs in an
> atomic context. This issue may also affect other LSMs.
> 
> This change avoids the unnecessary privilege check and resolves the
> sleeping function warning without any loss of information.
> 
> Signed-off-by: Alessandro Carminati <acarmina@redhat.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
diff mbox series

Patch

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 2a945c07ae99..737af23f4f4e 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -373,7 +373,7 @@  static void print_unreferenced(struct seq_file *seq,
 
 	for (i = 0; i < nr_entries; i++) {
 		void *ptr = (void *)entries[i];
-		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
+		warn_or_seq_printf(seq, "    %pS\n", ptr);
 	}
 }