Message ID | 218f2bef-5e5e-89c4-154b-24dc49c82c31@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, [auto build test ERROR on cgroup/for-next] [also build test ERROR on v4.7-rc5] [cannot apply to next-20160701] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Topi-Miettinen/capabilities-audit-capability-use/20160703-231120 base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next config: microblaze-mmu_defconfig (attached as .config) compiler: microblaze-linux-gcc (GCC) 4.9.0 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=microblaze All errors (new ones prefixed by >>): >> kernel/audit.c:1713:6: error: redefinition of 'audit_log_cap_use' void audit_log_cap_use(int cap) ^ In file included from kernel/audit.c:59:0: include/linux/audit.h:574:20: note: previous definition of 'audit_log_cap_use' was here static inline void audit_log_cap_use(int cap) ^ kernel/audit.c: In function 'audit_log_cap_use': >> kernel/audit.c:1730:2: error: implicit declaration of function 'audit_cgroup_list' [-Werror=implicit-function-declaration] audit_cgroup_list(ab); ^ cc1: some warnings being treated as errors vim +/audit_log_cap_use +1713 kernel/audit.c 1707 1708 if (log) 1709 audit_log_format(ab, " cap_fe=%d cap_fver=%x", 1710 name->fcap.fE, name->fcap_ver); 1711 } 1712 > 1713 void audit_log_cap_use(int cap) 1714 { 1715 struct audit_context *context = current->audit_context; 1716 struct audit_buffer *ab; 1717 kuid_t uid; 1718 kgid_t gid; 1719 1720 ab = audit_log_start(context, GFP_KERNEL, AUDIT_CAPABILITY); 1721 audit_log_format(ab, "cap_used=%d", cap); 1722 current_uid_gid(&uid, &gid); 1723 audit_log_format(ab, " pid=%d auid=%u uid=%u gid=%u ses=%u", 1724 task_pid_nr(current), 1725 from_kuid(&init_user_ns, audit_get_loginuid(current)), 1726 from_kuid(&init_user_ns, uid), 1727 from_kgid(&init_user_ns, gid), 1728 audit_get_sessionid(current)); 1729 audit_log_format(ab, " cgroups="); > 1730 audit_cgroup_list(ab); 1731 audit_log_end(ab); 1732 } 1733 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: > The attached patch would make any uses of capabilities generate audit > messages. It works for simple tests as you can see from the commit > message, but unfortunately the call to audit_cgroup_list() deadlocks the > system when booting a full blown OS. There's no deadlock when the call > is removed. > > I guess that in some cases, cgroup_mutex and/or css_set_lock could be > already held earlier before entering audit_cgroup_list(). Holding the > locks is however required by task_cgroup_from_root(). Is there any way > to avoid this? For example, only print some kind of cgroup ID numbers > (are there unique and stable IDs, available without locks?) for those > cgroups where the task is registered in the audit message? I am not sure if anyone know what really happens here. I suggest to enable lockdep. It might detect possible deadlock even before it really happens, see Documentation/locking/lockdep-design.txt It can be enabled by CONFIG_PROVE_LOCKING=y It depends on CONFIG_DEBUG_KERNEL=y and maybe some more options, see lib/Kconfig.debug Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/07/16 09:16, Petr Mladek wrote: > On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: >> The attached patch would make any uses of capabilities generate audit >> messages. It works for simple tests as you can see from the commit >> message, but unfortunately the call to audit_cgroup_list() deadlocks the >> system when booting a full blown OS. There's no deadlock when the call >> is removed. >> >> I guess that in some cases, cgroup_mutex and/or css_set_lock could be >> already held earlier before entering audit_cgroup_list(). Holding the >> locks is however required by task_cgroup_from_root(). Is there any way >> to avoid this? For example, only print some kind of cgroup ID numbers >> (are there unique and stable IDs, available without locks?) for those >> cgroups where the task is registered in the audit message? > > I am not sure if anyone know what really happens here. I suggest to > enable lockdep. It might detect possible deadlock even before it > really happens, see Documentation/locking/lockdep-design.txt > > It can be enabled by > > CONFIG_PROVE_LOCKING=y > > It depends on > > CONFIG_DEBUG_KERNEL=y > > and maybe some more options, see lib/Kconfig.debug Thanks a lot! I caught this stack dump: starting version 230 [ 3.416647] ------------[ cut here ]------------ [ 3.417310] WARNING: CPU: 0 PID: 95 at /home/topi/d/linux.git/kernel/locking/lockdep.c:2871 lockdep_trace_alloc+0xb4/0xc0 [ 3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [ 3.417923] Modules linked in: [ 3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97 [ 3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 3.418726] 0000000000000086 000000007970f3b0 ffff88000016fb00 ffffffff813c9c45 [ 3.418993] ffff88000016fb50 0000000000000000 ffff88000016fb40 ffffffff81091e9b [ 3.419176] 00000b3705e2c798 0000000000000046 0000000000000410 00000000ffffffff [ 3.419374] Call Trace: [ 3.419511] [<ffffffff813c9c45>] dump_stack+0x67/0x92 [ 3.419644] [<ffffffff81091e9b>] __warn+0xcb/0xf0 [ 3.419745] [<ffffffff81091f1f>] warn_slowpath_fmt+0x5f/0x80 [ 3.419868] [<ffffffff810e9a84>] lockdep_trace_alloc+0xb4/0xc0 [ 3.419988] [<ffffffff8120dc42>] kmem_cache_alloc_node+0x42/0x600 [ 3.420156] [<ffffffff8110432d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 3.420170] [<ffffffff8163183b>] __alloc_skb+0x5b/0x1d0 [ 3.420170] [<ffffffff81144f6b>] audit_log_start+0x29b/0x480 [ 3.420170] [<ffffffff810a2925>] ? __lock_task_sighand+0x95/0x270 [ 3.420170] [<ffffffff81145cc9>] audit_log_cap_use+0x39/0xf0 [ 3.420170] [<ffffffff8109cd75>] ns_capable+0x45/0x70 [ 3.420170] [<ffffffff8109cdb7>] capable+0x17/0x20 [ 3.420170] [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0 [ 3.420170] [<ffffffff81230997>] __vfs_write+0x37/0x160 [ 3.420170] [<ffffffff810e33b7>] ? update_fast_ctr+0x17/0x30 [ 3.420170] [<ffffffff810e3449>] ? percpu_down_read+0x49/0x90 [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 [ 3.420170] [<ffffffff81231048>] vfs_write+0xb8/0x1b0 [ 3.420170] [<ffffffff812533c6>] ? __fget_light+0x66/0x90 [ 3.420170] [<ffffffff81232078>] SyS_write+0x58/0xc0 [ 3.420170] [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300 [ 3.420170] [<ffffffff81849c9a>] entry_SYSCALL64_slow_path+0x25/0x25 [ 3.420170] ---[ end trace fb586899fb556a5e ]--- [ 3.447922] random: systemd-udevd urandom read with 3 bits of entropy available [ 4.014078] clocksource: Switched to clocksource tsc Begin: Loading essential drivers ... done. This is with qemu and the boot continues normally. With real computer, there's no such output and system just seems to freeze. Could it be possible that the deadlock happens because there's some IO towards /sys/fs/cgroup, which causes a capability check and that in turn causes locking problems when we try to print cgroup list? -Topi -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 2016-07-07 20:27:13, Topi Miettinen wrote: > On 07/07/16 09:16, Petr Mladek wrote: > > On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: > >> The attached patch would make any uses of capabilities generate audit > >> messages. It works for simple tests as you can see from the commit > >> message, but unfortunately the call to audit_cgroup_list() deadlocks the > >> system when booting a full blown OS. There's no deadlock when the call > >> is removed. > >> > >> I guess that in some cases, cgroup_mutex and/or css_set_lock could be > >> already held earlier before entering audit_cgroup_list(). Holding the > >> locks is however required by task_cgroup_from_root(). Is there any way > >> to avoid this? For example, only print some kind of cgroup ID numbers > >> (are there unique and stable IDs, available without locks?) for those > >> cgroups where the task is registered in the audit message? > > > > I am not sure if anyone know what really happens here. I suggest to > > enable lockdep. It might detect possible deadlock even before it > > really happens, see Documentation/locking/lockdep-design.txt > > > > It can be enabled by > > > > CONFIG_PROVE_LOCKING=y > > > > It depends on > > > > CONFIG_DEBUG_KERNEL=y > > > > and maybe some more options, see lib/Kconfig.debug > > Thanks a lot! I caught this stack dump: > > starting version 230 > [ 3.416647] ------------[ cut here ]------------ > [ 3.417310] WARNING: CPU: 0 PID: 95 at > /home/topi/d/linux.git/kernel/locking/lockdep.c:2871 > lockdep_trace_alloc+0xb4/0xc0 > [ 3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) > [ 3.417923] Modules linked in: > [ 3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97 > [ 3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Debian-1.8.2-1 04/01/2014 > [ 3.418726] 0000000000000086 000000007970f3b0 ffff88000016fb00 > ffffffff813c9c45 > [ 3.418993] ffff88000016fb50 0000000000000000 ffff88000016fb40 > ffffffff81091e9b > [ 3.419176] 00000b3705e2c798 0000000000000046 0000000000000410 > 00000000ffffffff > [ 3.419374] Call Trace: > [ 3.419511] [<ffffffff813c9c45>] dump_stack+0x67/0x92 > [ 3.419644] [<ffffffff81091e9b>] __warn+0xcb/0xf0 > [ 3.419745] [<ffffffff81091f1f>] warn_slowpath_fmt+0x5f/0x80 > [ 3.419868] [<ffffffff810e9a84>] lockdep_trace_alloc+0xb4/0xc0 > [ 3.419988] [<ffffffff8120dc42>] kmem_cache_alloc_node+0x42/0x600 > [ 3.420156] [<ffffffff8110432d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 3.420170] [<ffffffff8163183b>] __alloc_skb+0x5b/0x1d0 > [ 3.420170] [<ffffffff81144f6b>] audit_log_start+0x29b/0x480 > [ 3.420170] [<ffffffff810a2925>] ? __lock_task_sighand+0x95/0x270 > [ 3.420170] [<ffffffff81145cc9>] audit_log_cap_use+0x39/0xf0 > [ 3.420170] [<ffffffff8109cd75>] ns_capable+0x45/0x70 > [ 3.420170] [<ffffffff8109cdb7>] capable+0x17/0x20 > [ 3.420170] [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0 > [ 3.420170] [<ffffffff81230997>] __vfs_write+0x37/0x160 > [ 3.420170] [<ffffffff810e33b7>] ? update_fast_ctr+0x17/0x30 > [ 3.420170] [<ffffffff810e3449>] ? percpu_down_read+0x49/0x90 > [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 > [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 > [ 3.420170] [<ffffffff81231048>] vfs_write+0xb8/0x1b0 > [ 3.420170] [<ffffffff812533c6>] ? __fget_light+0x66/0x90 > [ 3.420170] [<ffffffff81232078>] SyS_write+0x58/0xc0 > [ 3.420170] [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300 > [ 3.420170] [<ffffffff81849c9a>] entry_SYSCALL64_slow_path+0x25/0x25 > [ 3.420170] ---[ end trace fb586899fb556a5e ]--- > [ 3.447922] random: systemd-udevd urandom read with 3 bits of entropy > available > [ 4.014078] clocksource: Switched to clocksource tsc > Begin: Loading essential drivers ... done. > > This is with qemu and the boot continues normally. With real computer, > there's no such output and system just seems to freeze. > > Could it be possible that the deadlock happens because there's some IO > towards /sys/fs/cgroup, which causes a capability check and that in turn > causes locking problems when we try to print cgroup list? The above warning is printed by the code from kernel/locking/lockdep.c:2871 static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) { [...] /* We're only interested __GFP_FS allocations for now */ if (!(gfp_mask & __GFP_FS)) return; /* * Oi! Can't be having __GFP_FS allocations with IRQs disabled. */ if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))) return; The backtrace shows that your new audit_log_cap_use() is called from vfs_write(). You might try to use audit_log_start() with GFP_NOFS instead of GFP_KERNEL. Note that this is rather intuitive advice. I still need to learn a lot about memory management and kernel in general to be more sure about a correct solution. Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/08/16 09:13, Petr Mladek wrote: > On Thu 2016-07-07 20:27:13, Topi Miettinen wrote: >> On 07/07/16 09:16, Petr Mladek wrote: >>> On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: >>>> The attached patch would make any uses of capabilities generate audit >>>> messages. It works for simple tests as you can see from the commit >>>> message, but unfortunately the call to audit_cgroup_list() deadlocks the >>>> system when booting a full blown OS. There's no deadlock when the call >>>> is removed. >>>> >>>> I guess that in some cases, cgroup_mutex and/or css_set_lock could be >>>> already held earlier before entering audit_cgroup_list(). Holding the >>>> locks is however required by task_cgroup_from_root(). Is there any way >>>> to avoid this? For example, only print some kind of cgroup ID numbers >>>> (are there unique and stable IDs, available without locks?) for those >>>> cgroups where the task is registered in the audit message? >>> >>> I am not sure if anyone know what really happens here. I suggest to >>> enable lockdep. It might detect possible deadlock even before it >>> really happens, see Documentation/locking/lockdep-design.txt >>> >>> It can be enabled by >>> >>> CONFIG_PROVE_LOCKING=y >>> >>> It depends on >>> >>> CONFIG_DEBUG_KERNEL=y >>> >>> and maybe some more options, see lib/Kconfig.debug >> >> Thanks a lot! I caught this stack dump: >> >> starting version 230 >> [ 3.416647] ------------[ cut here ]------------ >> [ 3.417310] WARNING: CPU: 0 PID: 95 at >> /home/topi/d/linux.git/kernel/locking/lockdep.c:2871 >> lockdep_trace_alloc+0xb4/0xc0 >> [ 3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) >> [ 3.417923] Modules linked in: >> [ 3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97 >> [ 3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS Debian-1.8.2-1 04/01/2014 >> [ 3.418726] 0000000000000086 000000007970f3b0 ffff88000016fb00 >> ffffffff813c9c45 >> [ 3.418993] ffff88000016fb50 0000000000000000 ffff88000016fb40 >> ffffffff81091e9b >> [ 3.419176] 00000b3705e2c798 0000000000000046 0000000000000410 >> 00000000ffffffff >> [ 3.419374] Call Trace: >> [ 3.419511] [<ffffffff813c9c45>] dump_stack+0x67/0x92 >> [ 3.419644] [<ffffffff81091e9b>] __warn+0xcb/0xf0 >> [ 3.419745] [<ffffffff81091f1f>] warn_slowpath_fmt+0x5f/0x80 >> [ 3.419868] [<ffffffff810e9a84>] lockdep_trace_alloc+0xb4/0xc0 >> [ 3.419988] [<ffffffff8120dc42>] kmem_cache_alloc_node+0x42/0x600 >> [ 3.420156] [<ffffffff8110432d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 >> [ 3.420170] [<ffffffff8163183b>] __alloc_skb+0x5b/0x1d0 >> [ 3.420170] [<ffffffff81144f6b>] audit_log_start+0x29b/0x480 >> [ 3.420170] [<ffffffff810a2925>] ? __lock_task_sighand+0x95/0x270 >> [ 3.420170] [<ffffffff81145cc9>] audit_log_cap_use+0x39/0xf0 >> [ 3.420170] [<ffffffff8109cd75>] ns_capable+0x45/0x70 >> [ 3.420170] [<ffffffff8109cdb7>] capable+0x17/0x20 >> [ 3.420170] [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0 >> [ 3.420170] [<ffffffff81230997>] __vfs_write+0x37/0x160 >> [ 3.420170] [<ffffffff810e33b7>] ? update_fast_ctr+0x17/0x30 >> [ 3.420170] [<ffffffff810e3449>] ? percpu_down_read+0x49/0x90 >> [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 >> [ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0 >> [ 3.420170] [<ffffffff81231048>] vfs_write+0xb8/0x1b0 >> [ 3.420170] [<ffffffff812533c6>] ? __fget_light+0x66/0x90 >> [ 3.420170] [<ffffffff81232078>] SyS_write+0x58/0xc0 >> [ 3.420170] [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300 >> [ 3.420170] [<ffffffff81849c9a>] entry_SYSCALL64_slow_path+0x25/0x25 >> [ 3.420170] ---[ end trace fb586899fb556a5e ]--- >> [ 3.447922] random: systemd-udevd urandom read with 3 bits of entropy >> available >> [ 4.014078] clocksource: Switched to clocksource tsc >> Begin: Loading essential drivers ... done. >> >> This is with qemu and the boot continues normally. With real computer, >> there's no such output and system just seems to freeze. >> >> Could it be possible that the deadlock happens because there's some IO >> towards /sys/fs/cgroup, which causes a capability check and that in turn >> causes locking problems when we try to print cgroup list? > > The above warning is printed by the code from > kernel/locking/lockdep.c:2871 > > static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) > { > [...] > /* We're only interested __GFP_FS allocations for now */ > if (!(gfp_mask & __GFP_FS)) > return; > > /* > * Oi! Can't be having __GFP_FS allocations with IRQs disabled. > */ > if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))) > return; > > > The backtrace shows that your new audit_log_cap_use() is called > from vfs_write(). You might try to use audit_log_start() with > GFP_NOFS instead of GFP_KERNEL. > > Note that this is rather intuitive advice. I still need to learn a lot > about memory management and kernel in general to be more sure about > a correct solution. Here's what I got now: [ 18.043181] [ 18.044123] ====================================================== [ 18.044123] [ INFO: possible circular locking dependency detected ] [ 18.044123] 4.7.0-rc5+ #99 Not tainted [ 18.044123] ------------------------------------------------------- [ 18.044123] systemd/1 is trying to acquire lock: [ 18.044123] (tasklist_lock){.+.+..}, at: [<ffffffff81137ae1>] cgroup_mount+0x4f1/0xc10 [ 18.044123] [ 18.044123] but task is already holding lock: [ 18.044123] (css_set_lock){......}, at: [<ffffffff81137a9d>] cgroup_mount+0x4ad/0xc10 [ 18.044123] [ 18.044123] which lock already depends on the new lock. [ 18.044123] [ 18.044123] [ 18.044123] the existing dependency chain (in reverse order) is: [ 18.044123] -> #3 (css_set_lock){......}: [ 18.044123] [<ffffffff810e92b3>] lock_acquire+0xe3/0x1c0 [ 18.044123] [<ffffffff8184e187>] _raw_spin_lock_irq+0x37/0x50 [ 18.044123] [<ffffffff811374be>] cgroup_setup_root+0x19e/0x2d0 [ 18.044123] [<ffffffff821911fc>] cgroup_init+0xec/0x41d [ 18.044123] [<ffffffff82171f68>] start_kernel+0x40c/0x465 [ 18.044123] [<ffffffff82171294>] x86_64_start_reservations+0x2f/0x31 [ 18.044123] [<ffffffff8217140e>] x86_64_start_kernel+0x178/0x18b [ 18.044123] -> #2 (cgroup_mutex){+.+...}: [ 18.044123] [<ffffffff810e92b3>] lock_acquire+0xe3/0x1c0 [ 18.044123] [<ffffffff8184afaf>] mutex_lock_nested+0x5f/0x350 [ 18.044123] [<ffffffff8113967a>] audit_cgroup_list+0x4a/0x2f0 [ 18.044123] [<ffffffff81145d69>] audit_log_cap_use+0xd9/0xf0 [ 18.044123] [<ffffffff8109cd75>] ns_capable+0x45/0x70 [ 18.044123] [<ffffffff8109cdb7>] capable+0x17/0x20 [ 18.044123] [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0 [ 18.044123] [<ffffffff81230997>] __vfs_write+0x37/0x160 [ 18.044123] [<ffffffff81231048>] vfs_write+0xb8/0x1b0 [ 18.044123] [<ffffffff81232078>] SyS_write+0x58/0xc0 [ 18.044123] [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300 [ 18.044123] [<ffffffff8184ea5a>] return_from_SYSCALL_64+0x0/0x7a [ 18.044123] -> #1 (&(&sighand->siglock)->rlock){+.+...}: [ 18.044123] [<ffffffff810e92b3>] lock_acquire+0xe3/0x1c0 [ 18.044123] [<ffffffff8184e011>] _raw_spin_lock+0x31/0x40 [ 18.044123] [<ffffffff810901d9>] copy_process.part.34+0x10f9/0x1b40 [ 18.044123] [<ffffffff81090e23>] _do_fork+0xf3/0x6b0 [ 18.044123] [<ffffffff81091409>] kernel_thread+0x29/0x30 [ 18.044123] [<ffffffff810b71d7>] kthreadd+0x187/0x1e0 [ 18.044123] [<ffffffff8184ebbf>] ret_from_fork+0x1f/0x40 [ 18.044123] -> #0 (tasklist_lock){.+.+..}: [ 18.044123] [<ffffffff810e8dfb>] __lock_acquire+0x13cb/0x1440 [ 18.044123] [<ffffffff810e92b3>] lock_acquire+0xe3/0x1c0 [ 18.044123] [<ffffffff8184e444>] _raw_read_lock+0x34/0x50 [ 18.044123] [<ffffffff81137ae1>] cgroup_mount+0x4f1/0xc10 [ 18.044123] [<ffffffff81234de8>] mount_fs+0x38/0x170 [ 18.044123] [<ffffffff812562bb>] vfs_kern_mount+0x6b/0x150 [ 18.044123] [<ffffffff81258fdc>] do_mount+0x24c/0xe30 [ 18.044123] [<ffffffff81259ef5>] SyS_mount+0x95/0xe0 [ 18.044123] [<ffffffff8184e9a5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 18.044123] [ 18.044123] other info that might help us debug this: [ 18.044123] [ 18.044123] Chain exists of: tasklist_lock --> cgroup_mutex --> css_set_lock [ 18.044123] Possible unsafe locking scenario: [ 18.044123] [ 18.044123] CPU0 CPU1 [ 18.044123] ---- ---- [ 18.044123] lock(css_set_lock); [ 18.044123] lock(cgroup_mutex); [ 18.044123] lock(css_set_lock); [ 18.044123] lock(tasklist_lock); [ 18.044123] [ 18.044123] *** DEADLOCK *** [ 18.044123] [ 18.044123] 1 lock held by systemd/1: [ 18.044123] #0: (css_set_lock){......}, at: [<ffffffff81137a9d>] cgroup_mount+0x4ad/0xc10 [ 18.044123] [ 18.044123] stack backtrace: [ 18.044123] CPU: 0 PID: 1 Comm: systemd Not tainted 4.7.0-rc5+ #99 [ 18.044123] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 18.044123] 0000000000000086 0000000008966b11 ffff880006d13bb0 ffffffff813c9c45 [ 18.044123] ffffffff829dbed0 ffffffff829cf2a0 ffff880006d13bf0 ffffffff810e60a3 [ 18.044123] ffff880006d13c30 ffff880006d067b0 ffff880006d06040 0000000000000001 [ 18.044123] Call Trace: [ 18.044123] [<ffffffff813c9c45>] dump_stack+0x67/0x92 [ 18.044123] [<ffffffff810e60a3>] print_circular_bug+0x1e3/0x250 [ 18.044123] [<ffffffff810e8dfb>] __lock_acquire+0x13cb/0x1440 [ 18.044123] [<ffffffff810e92b3>] lock_acquire+0xe3/0x1c0 [ 18.044123] [<ffffffff81137ae1>] ? cgroup_mount+0x4f1/0xc10 [ 18.044123] [<ffffffff8184e444>] _raw_read_lock+0x34/0x50 [ 18.044123] [<ffffffff81137ae1>] ? cgroup_mount+0x4f1/0xc10 [ 18.044123] [<ffffffff81137ae1>] cgroup_mount+0x4f1/0xc10 [ 18.044123] [<ffffffff810e5637>] ? lockdep_init_map+0x57/0x1f0 [ 18.044123] [<ffffffff81234de8>] mount_fs+0x38/0x170 [ 18.044123] [<ffffffff812562bb>] vfs_kern_mount+0x6b/0x150 [ 18.044123] [<ffffffff81258fdc>] do_mount+0x24c/0xe30 [ 18.044123] [<ffffffff8121060b>] ? kmem_cache_alloc_trace+0x28b/0x5e0 [ 18.044123] [<ffffffff811cc1c6>] ? strndup_user+0x46/0x80 [ 18.044123] [<ffffffff81259ef5>] SyS_mount+0x95/0xe0 [ 18.044123] [<ffffffff8184e9a5>] entry_SYSCALL_64_fastpath+0x18/0xa8 This is with GFP_KERNEL changed to GFP_NOFS for both allocations. -Topi -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 2d5248f91998873174dbcbcafe87e5b30c3858aa Mon Sep 17 00:00:00 2001 From: Topi Miettinen <toiwoton@gmail.com> Date: Sat, 2 Jul 2016 16:25:20 +0300 Subject: [PATCH] capabilities: audit capability use There are many basic ways to control processes, including capabilities, cgroups and resource limits. However, there are far fewer ways to find out useful values for the limits, except blind trial and error. Currently, there is no way to know which capabilities are actually used. Even the source code is only implicit, in-depth knowledge of each capability must be used when analyzing a program to judge which capabilities the program will exercise. Generate an audit message when capabilities are used. This can then be used to configure capability sets for services by a software developer, maintainer or system administrator. Test case demonstrating basic capability monitoring with the new message type 1330 and how the cgroups are displayed (boot to rdshell): BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) cd /sys/fs (initramfs) mount -t cgroup2 cgroup cgroup [ 16.503902] audit_printk_skb: 4026 callbacks suppressed [ 16.505059] audit: type=1330 audit(1467543885.733:469): cap_used=21 pid=214 auid=4294967295 uid=0 gid=0 ses=4294967295 cgroups= [ 16.506845] audit: type=1330 audit(1467543885.733:469): cap_used=21 pid=214 auid=4294967295 uid=0 gid=0 ses=4294967295 cgroups= [ 16.509234] audit: type=1300 audit(1467543885.733:469): arch=c000003e syscall=165 success=yes exit=0 a0=7ffc2f394e2d a1=7ffc2f394e34 a2=7ffc2f394e25 a3=8000 items=0 ppid=213 pid=214 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mount" exe="/bin/mount" key=(null) [ 16.510134] audit: type=1327 audit(1467543885.733:469): proctitle=6D6F756E74002D74006367726F757032006367726F7570006367726F7570 (initramfs) cd cgroup (initramfs) mkdir test; cd test [ 16.533829] audit: type=1330 audit(1467543885.765:470): cap_used=1 pid=215 auid=4294967295 uid=0 gid=0 ses=4294967295 cgroups=:/; [ 16.536587] audit: type=1300 audit(1467543885.765:470): arch=c000003e syscall=83 success=yes exit=0 a0=7ffe4f0bfe29 a1=1ff a2=0 a3=1e2 items=0 ppid=213 pid=215 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mkdir" exe="/bin/mkdir" key=(null) [ 16.537263] audit: type=1327 audit(1467543885.765:470): proctitle=6D6B6469720074657374 (initramfs) echo $$ >cgroup.procs (initramfs) mknod /dev/z_$$ c 1 2 [ 16.571516] audit: type=1330 audit(1467543885.801:471): cap_used=27 pid=216 auid=4294967295 uid=0 gid=0 ses=4294967295 cgroups=:/test; [ 16.572812] audit: type=1300 audit(1467543885.801:471): arch=c000003e syscall=133 success=yes exit=0 a0=7ffe04fe3e11 a1=21b6 a2=102 a3=5c9 items=0 ppid=213 pid=216 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mknod" exe="/bin/mknod" key=(null) [ 16.573571] audit: type=1327 audit(1467543885.801:471): proctitle=6D6B6E6F64002F6465762F7A5F323133006300310032 Signed-off-by: Topi Miettinen <toiwoton@gmail.com> --- include/linux/audit.h | 4 +++ include/linux/cgroup.h | 2 ++ include/uapi/linux/audit.h | 1 + kernel/audit.c | 22 ++++++++++++++++ kernel/capability.c | 5 ++-- kernel/cgroup.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 94 insertions(+), 2 deletions(-) diff --git a/include/linux/audit.h b/include/linux/audit.h index e38e3fc..971cb2e 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -438,6 +438,8 @@ static inline void audit_mmap_fd(int fd, int flags) __audit_mmap_fd(fd, flags); } +extern void audit_log_cap_use(int cap); + extern int audit_n_rules; extern int audit_signals; #else /* CONFIG_AUDITSYSCALL */ @@ -545,6 +547,8 @@ static inline void audit_mmap_fd(int fd, int flags) { } static inline void audit_ptrace(struct task_struct *t) { } +static inline void audit_log_cap_use(int cap) +{ } #define audit_n_rules 0 #define audit_signals 0 #endif /* CONFIG_AUDITSYSCALL */ diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index a20320c..b5dc8aa 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -100,6 +100,8 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen); int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); +struct audit_buffer; +void audit_cgroup_list(struct audit_buffer *ab); void cgroup_fork(struct task_struct *p); extern int cgroup_can_fork(struct task_struct *p); diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index d820aa9..a5c9a73 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -111,6 +111,7 @@ #define AUDIT_PROCTITLE 1327 /* Proctitle emit event */ #define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes */ #define AUDIT_REPLACE 1329 /* Replace auditd if this packet unanswerd */ +#define AUDIT_CAPABILITY 1330 /* Record showing capability use */ #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */ #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */ diff --git a/kernel/audit.c b/kernel/audit.c index 8d528f9..370beb7 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -54,6 +54,7 @@ #include <linux/kthread.h> #include <linux/kernel.h> #include <linux/syscalls.h> +#include <linux/cgroup.h> #include <linux/audit.h> @@ -1709,6 +1710,27 @@ static void audit_log_fcaps(struct audit_buffer *ab, struct audit_names *name) name->fcap.fE, name->fcap_ver); } +void audit_log_cap_use(int cap) +{ + struct audit_context *context = current->audit_context; + struct audit_buffer *ab; + kuid_t uid; + kgid_t gid; + + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CAPABILITY); + audit_log_format(ab, "cap_used=%d", cap); + current_uid_gid(&uid, &gid); + audit_log_format(ab, " pid=%d auid=%u uid=%u gid=%u ses=%u", + task_pid_nr(current), + from_kuid(&init_user_ns, audit_get_loginuid(current)), + from_kuid(&init_user_ns, uid), + from_kgid(&init_user_ns, gid), + audit_get_sessionid(current)); + audit_log_format(ab, " cgroups="); + audit_cgroup_list(ab); + audit_log_end(ab); +} + static inline int audit_copy_fcaps(struct audit_names *name, const struct dentry *dentry) { diff --git a/kernel/capability.c b/kernel/capability.c index 45432b5..d45d5b1 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -366,8 +366,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap) * @ns: The usernamespace we want the capability in * @cap: The capability to be tested for * - * Return true if the current task has the given superior capability currently - * available for use, false if not. + * Return true if the current task has the given superior capability + * currently available for use, false if not. Write an audit message. * * This sets PF_SUPERPRIV on the task if the capability is available on the * assumption that it's about to be used. @@ -380,6 +380,7 @@ bool ns_capable(struct user_namespace *ns, int cap) } if (security_capable(current_cred(), ns, cap) == 0) { + audit_log_cap_use(cap); current->flags |= PF_SUPERPRIV; return true; } diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 75c0ff0..3b92e85 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -63,6 +63,7 @@ #include <linux/nsproxy.h> #include <linux/proc_ns.h> #include <net/sock.h> +#include <linux/audit.h> /* * pidlists linger the following amount before being destroyed. The goal @@ -5789,6 +5790,67 @@ out: return retval; } +/* + * audit_cgroup_list() + * - Print task's cgroup paths with audit_log_format() + * - Used for capability audit logging + * - Otherwise very similar to proc_cgroup_show(). + */ +void audit_cgroup_list(struct audit_buffer *ab) +{ + char *buf, *path; + struct cgroup_root *root; + + buf = kmalloc(PATH_MAX, GFP_KERNEL); + if (!buf) + return; + + mutex_lock(&cgroup_mutex); + spin_lock_irq(&css_set_lock); + + for_each_root(root) { + struct cgroup_subsys *ss; + struct cgroup *cgrp; + int ssid, count = 0; + + if (root == &cgrp_dfl_root && !cgrp_dfl_visible) + continue; + + if (root != &cgrp_dfl_root) + for_each_subsys(ss, ssid) + if (root->subsys_mask & (1 << ssid)) + audit_log_format(ab, "%s%s", + count++ ? "," : "", + ss->legacy_name); + if (strlen(root->name)) + audit_log_format(ab, "%sname=%s", count ? "," : "", + root->name); + audit_log_format(ab, ":"); + + cgrp = task_cgroup_from_root(current, root); + + if (cgroup_on_dfl(cgrp) || !(current->flags & PF_EXITING)) { + path = cgroup_path_ns_locked(cgrp, buf, PATH_MAX, + current->nsproxy->cgroup_ns); + if (!path) + goto out_unlock; + } else + path = "/"; + + audit_log_format(ab, "%s", path); + + if (cgroup_on_dfl(cgrp) && cgroup_is_dead(cgrp)) + audit_log_format(ab, " (deleted);"); + else + audit_log_format(ab, ";"); + } + +out_unlock: + spin_unlock_irq(&css_set_lock); + mutex_unlock(&cgroup_mutex); + kfree(buf); +} + /* Display information about each subsystem and each hierarchy */ static int proc_cgroupstats_show(struct seq_file *m, void *v) { -- 2.8.1