diff mbox

local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)

Message ID 20161005200522.GE19539@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Oct. 5, 2016, 8:05 p.m. UTC
On Wed, Oct 05, 2016 at 02:57:04PM -0400, CAI Qian wrote:

> Not sure if this related, and there is always a lockdep regards procfs happened
> below unless masking by other lockdep issues before the cgroup hang. Also, this
> hang is always reproducible.

Sigh...  Let's get the /proc/*/auxv out of the way - this should deal with it:

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

CAI Qian Oct. 6, 2016, 12:20 p.m. UTC | #1
----- Original Message -----
> From: "Al Viro" <viro@ZenIV.linux.org.uk>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "tj" <tj@kernel.org>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Dave Chinner" <david@fromorbit.com>,
> "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> linux-fsdevel@vger.kernel.org
> Sent: Wednesday, October 5, 2016 4:05:22 PM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> On Wed, Oct 05, 2016 at 02:57:04PM -0400, CAI Qian wrote:
> 
> > Not sure if this related, and there is always a lockdep regards procfs
> > happened
> > below unless masking by other lockdep issues before the cgroup hang. Also,
> > this
> > hang is always reproducible.
> 
> Sigh...  Let's get the /proc/*/auxv out of the way - this should deal with
> it:
So I applied both this and the sanity patch, and both original sanity and the
proc warnings went away. However, the cgroup hang can still be reproduced as
well as this new xfs internal error below,

[16921.141233] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line 5619 of file fs/xfs/libxfs/xfs_bmap.c.  Caller xfs_bmap_shift_extents+0x1cc/0x3a0 [xfs]
[16921.157694] CPU: 9 PID: 52920 Comm: trinity-c108 Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[16921.167012] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
[16921.178368]  0000000000000286 00000000c3833246 ffff8803d0a83b60 ffffffff813d2ecc
[16921.186658]  ffff88042a898000 0000000000000001 ffff8803d0a83b78 ffffffffa02f36eb
[16921.194946]  ffffffffa02b544c ffff8803d0a83c30 ffffffffa02a8e52 ffff88042a898040
[16921.203238] Call Trace:
[16921.205972]  [<ffffffff813d2ecc>] dump_stack+0x85/0xc9
[16921.211742]  [<ffffffffa02f36eb>] xfs_error_report+0x3b/0x40 [xfs]
[16921.218660]  [<ffffffffa02b544c>] ? xfs_bmap_shift_extents+0x1cc/0x3a0 [xfs]
[16921.226543]  [<ffffffffa02a8e52>] xfs_bmse_shift_one.constprop.20+0x332/0x370 [xfs]
[16921.235090]  [<ffffffff817cb73a>] ? kmemleak_alloc+0x4a/0xa0
[16921.241426]  [<ffffffffa02b544c>] xfs_bmap_shift_extents+0x1cc/0x3a0 [xfs]
[16921.249122]  [<ffffffffa03142aa>] ? xfs_trans_add_item+0x2a/0x60 [xfs]
[16921.256430]  [<ffffffffa02eb361>] xfs_shift_file_space+0x231/0x2f0 [xfs]
[16921.263931]  [<ffffffffa02ebe8c>] xfs_collapse_file_space+0x5c/0x180 [xfs]
[16921.271622]  [<ffffffffa02f69b8>] xfs_file_fallocate+0x158/0x360 [xfs]
[16921.278907]  [<ffffffff810f8eae>] ? update_fast_ctr+0x4e/0x70
[16921.285320]  [<ffffffff810f8f57>] ? percpu_down_read+0x57/0x90
[16921.291828]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[16921.298337]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[16921.304847]  [<ffffffff8127e000>] vfs_fallocate+0x140/0x230
[16921.311067]  [<ffffffff8127eee4>] SyS_fallocate+0x44/0x70
[16921.317091]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[16921.323212]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25

    CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 6, 2016, 12:25 p.m. UTC | #2
----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "Al Viro" <viro@ZenIV.linux.org.uk>
> Cc: "tj" <tj@kernel.org>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Dave Chinner" <david@fromorbit.com>,
> "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> linux-fsdevel@vger.kernel.org
> Sent: Thursday, October 6, 2016 8:20:17 AM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> 
> 
> ----- Original Message -----
> > From: "Al Viro" <viro@ZenIV.linux.org.uk>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: "tj" <tj@kernel.org>, "Linus Torvalds" <torvalds@linux-foundation.org>,
> > "Dave Chinner" <david@fromorbit.com>,
> > "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>,
> > "Nick Piggin" <npiggin@gmail.com>,
> > linux-fsdevel@vger.kernel.org
> > Sent: Wednesday, October 5, 2016 4:05:22 PM
> > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT]
> > splice_read reworked)
> > 
> > On Wed, Oct 05, 2016 at 02:57:04PM -0400, CAI Qian wrote:
> > 
> > > Not sure if this related, and there is always a lockdep regards procfs
> > > happened
> > > below unless masking by other lockdep issues before the cgroup hang.
> > > Also,
> > > this
> > > hang is always reproducible.
> > 
> > Sigh...  Let's get the /proc/*/auxv out of the way - this should deal with
> > it:
> So I applied both this and the sanity patch, and both original sanity and the
> proc warnings went away. However, the cgroup hang can still be reproduced as
> well as this new xfs internal error below,

Wait. There is also a lockep happened before the xfs internal error as well.

[ 5839.452325] ======================================================
[ 5839.459221] [ INFO: possible circular locking dependency detected ]
[ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
[ 5839.471945] -------------------------------------------------------
[ 5839.478937] trinity-c220/69531 is trying to acquire lock:
[ 5839.484961]  (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
[ 5839.492967] 
but task is already holding lock:
[ 5839.499476]  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 5839.508560] 
which lock already depends on the new lock.

[ 5839.517686] 
the existing dependency chain (in reverse order) is:
[ 5839.526036] 
-> #3 (sb_writers#8){.+.+.+}:
[ 5839.530751]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 5839.537368]        [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90
[ 5839.544275]        [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 5839.551181]        [<ffffffff812a8544>] mnt_want_write+0x24/0x50
[ 5839.557892]        [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30 [overlay]
[ 5839.565577]        [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480 [overlay]
[ 5839.573259]        [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay]
[ 5839.580555]        [<ffffffff812918ea>] vfs_unlink+0xda/0x190
[ 5839.586979]        [<ffffffff81293698>] do_unlinkat+0x268/0x2b0
[ 5839.593599]        [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30
[ 5839.600120]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 5839.606836]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
[ 5839.614231] 
-> #2 (&sb->s_type->i_mutex_key#17){++++++}:
[ 5839.620399]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 5839.627015]        [<ffffffff817d1b77>] down_read+0x47/0x70
[ 5839.633242]        [<ffffffff8128cfd2>] lookup_slow+0xc2/0x1f0
[ 5839.639762]        [<ffffffff8128f6f2>] walk_component+0x172/0x220
[ 5839.646668]        [<ffffffff81290fd6>] link_path_walk+0x1a6/0x620
[ 5839.653574]        [<ffffffff81291a81>] path_openat+0xe1/0xdb0
[ 5839.660092]        [<ffffffff812939e1>] do_filp_open+0x91/0x100
[ 5839.666707]        [<ffffffff81288e06>] do_open_execat+0x76/0x180
[ 5839.673517]        [<ffffffff81288f3b>] open_exec+0x2b/0x50
[ 5839.679743]        [<ffffffff812eccf3>] load_elf_binary+0x2a3/0x10a0
[ 5839.686844]        [<ffffffff81288917>] search_binary_handler+0x97/0x1d0
[ 5839.694331]        [<ffffffff81289ed8>] do_execveat_common.isra.35+0x678/0x9a0
[ 5839.702400]        [<ffffffff8128a4da>] SyS_execve+0x3a/0x50
[ 5839.708726]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 5839.715441]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
[ 5839.722833] 
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[ 5839.728510]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 5839.735126]        [<ffffffff817cfc66>] mutex_lock_killable_nested+0x86/0x540
[ 5839.743097]        [<ffffffff81301e84>] lock_trace+0x24/0x60
[ 5839.749421]        [<ffffffff8130224d>] proc_pid_syscall+0x2d/0x110
[ 5839.756423]        [<ffffffff81302af0>] proc_single_show+0x50/0x90
[ 5839.763330]        [<ffffffff812ab867>] traverse+0xf7/0x210
[ 5839.769557]        [<ffffffff812ac9eb>] seq_read+0x39b/0x3e0
[ 5839.775884]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
[ 5839.783179]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
[ 5839.790181]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
[ 5839.796406]        [<ffffffff81281c12>] do_preadv+0xa2/0xc0
[ 5839.802634]        [<ffffffff81282ec1>] SyS_preadv+0x11/0x20
[ 5839.808963]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 5839.815681]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
[ 5839.823075] 
-> #0 (&p->lock){+.+.+.}:
[ 5839.827395]        [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
[ 5839.834500]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 5839.841115]        [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
[ 5839.848219]        [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
[ 5839.854448]        [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
[ 5839.861451]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
[ 5839.868742]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
[ 5839.875744]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
[ 5839.881971]        [<ffffffff812bc55a>] default_file_splice_read+0x1aa/0x2c0
[ 5839.889847]        [<ffffffff812bb913>] do_splice_to+0x73/0x90
[ 5839.896365]        [<ffffffff812bba1b>] splice_direct_to_actor+0xeb/0x220
[ 5839.903950]        [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
[ 5839.910857]        [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
[ 5839.917470]        [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
[ 5839.924184]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 5839.930898]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
[ 5839.938286] 
other info that might help us debug this:

[ 5839.947217] Chain exists of:
  &p->lock --> &sb->s_type->i_mutex_key#17 --> sb_writers#8

[ 5839.956615]  Possible unsafe locking scenario:

[ 5839.963218]        CPU0                    CPU1
[ 5839.968269]        ----                    ----
[ 5839.973321]   lock(sb_writers#8);
[ 5839.977046]                                lock(&sb->s_type->i_mutex_key#17);
[ 5839.985037]                                lock(sb_writers#8);
[ 5839.991573]   lock(&p->lock);
[ 5839.994900] 
 *** DEADLOCK ***

[ 5840.001503] 1 lock held by trinity-c220/69531:
[ 5840.006457]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 5840.016031] 
stack backtrace:
[ 5840.020891] CPU: 12 PID: 69531 Comm: trinity-c220 Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 5840.030306] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
[ 5840.041660]  0000000000000086 00000000a1ef62f8 ffff8803ca52f7c0 ffffffff813d2ecc
[ 5840.049952]  ffffffff82a41160 ffffffff82a913e0 ffff8803ca52f800 ffffffff811dd630
[ 5840.058245]  ffff8803ca52f840 ffff880392c4ecc8 ffff880392c4e000 0000000000000001
[ 5840.066537] Call Trace:
[ 5840.069266]  [<ffffffff813d2ecc>] dump_stack+0x85/0xc9
[ 5840.075000]  [<ffffffff811dd630>] print_circular_bug+0x1f9/0x207
[ 5840.081701]  [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
[ 5840.088208]  [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 5840.094232]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
[ 5840.100061]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
[ 5840.105891]  [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
[ 5840.112397]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
[ 5840.118228]  [<ffffffff810fb3e9>] ? __lock_is_held+0x49/0x70
[ 5840.124540]  [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
[ 5840.130175]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
[ 5840.137360]  [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
[ 5840.143770]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
[ 5840.150956]  [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
[ 5840.157657]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
[ 5840.164843]  [<ffffffff81281a03>] do_readv_writev+0x213/0x230
[ 5840.171255]  [<ffffffff81418cf9>] ? __pipe_get_pages+0x24/0x9b
[ 5840.177762]  [<ffffffff813e6f0f>] ? iov_iter_get_pages_alloc+0x19f/0x360
[ 5840.185240]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
[ 5840.191843]  [<ffffffff81281a59>] vfs_readv+0x39/0x50
[ 5840.197478]  [<ffffffff812bc55a>] default_file_splice_read+0x1aa/0x2c0
[ 5840.204763]  [<ffffffff810cba89>] ? __might_sleep+0x49/0x80
[ 5840.210980]  [<ffffffff81349c93>] ? security_file_permission+0xa3/0xc0
[ 5840.218264]  [<ffffffff812bb913>] do_splice_to+0x73/0x90
[ 5840.224190]  [<ffffffff812bba1b>] splice_direct_to_actor+0xeb/0x220
[ 5840.231182]  [<ffffffff812baee0>] ? generic_pipe_buf_nosteal+0x10/0x10
[ 5840.238465]  [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
[ 5840.244778]  [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
[ 5840.250802]  [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
[ 5840.256922]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 5840.263042]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25

   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 6, 2016, 4:11 p.m. UTC | #3
> > > On Wed, Oct 05, 2016 at 02:57:04PM -0400, CAI Qian wrote:
> > > 
> > > > Not sure if this related, and there is always a lockdep regards procfs
> > > > happened
> > > > below unless masking by other lockdep issues before the cgroup hang.
> > > > Also,
> > > > this
> > > > hang is always reproducible.
> > > 
> > > Sigh...  Let's get the /proc/*/auxv out of the way - this should deal
> > > with
> > > it:
> > So I applied both this and the sanity patch, and both original sanity and
> > the
> > proc warnings went away. However, the cgroup hang can still be reproduced
> > as
> > well as this new xfs internal error below,
> 
> Wait. There is also a lockep happened before the xfs internal error as well.
Some other lockdep this time,

[ 4872.310639] =================================
[ 4872.315499] [ INFO: inconsistent lock state ]
[ 4872.320359] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
[ 4872.326091] ---------------------------------
[ 4872.330950] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 4872.338235] kswapd1/437 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 4872.343965]  (&xfs_nondir_ilock_class){++++?.}, at: [<ffffffffa029968e>] xfs_ilock+0x18e/0x260 [xfs]
[ 4872.354236] {RECLAIM_FS-ON-W} state was registered at:
[ 4872.359969]   [<ffffffff810fcbd6>] mark_held_locks+0x66/0x90
[ 4872.366297]   [<ffffffff810fffd5>] lockdep_trace_alloc+0xc5/0x110
[ 4872.373107]   [<ffffffff81253ad3>] kmem_cache_alloc+0x33/0x2e0
[ 4872.379628]   [<ffffffffa02a8386>] kmem_zone_alloc+0x96/0x120 [xfs]
[ 4872.386654]   [<ffffffffa024967b>] xfs_bmbt_init_cursor+0x3b/0x160 [xfs]
[ 4872.394147]   [<ffffffffa0247f8f>] xfs_bunmapi+0x80f/0xb00 [xfs]
[ 4872.400202] kmemleak: Cannot allocate a kmemleak_object structure
[ 4872.400205] kmemleak: Kernel memory leak detector disabled
[ 4872.400337] kmemleak: Automatic memory scanning thread ended
[ 4872.400869] kmemleak: Kmemleak disabled without freeing internal data. Reclaim the memory with "echo clear > /sys/kernel/debug/kmemleak".
[ 4872.433878]   [<ffffffffa027ddc3>] xfs_bmap_punch_delalloc_range+0xe3/0x180 [xfs]
[ 4872.442253]   [<ffffffffa0294b39>] xfs_file_iomap_end+0x89/0xd0 [xfs]
[ 4872.449468]   [<ffffffff812f3da0>] iomap_apply+0xe0/0x130
[ 4872.455505]   [<ffffffff812f3e58>] iomap_file_buffered_write+0x68/0xa0
[ 4872.462798]   [<ffffffffa028a87f>] xfs_file_buffered_aio_write+0x14f/0x350 [xfs]
[ 4872.471079]   [<ffffffffa028ab6d>] xfs_file_write_iter+0xed/0x130 [xfs]
[ 4872.478485]   [<ffffffff81280eee>] do_iter_readv_writev+0xae/0x130
[ 4872.485393]   [<ffffffff81281992>] do_readv_writev+0x1a2/0x230
[ 4872.491911]   [<ffffffff81281c6c>] vfs_writev+0x3c/0x50
[ 4872.497752]   [<ffffffff81281ce4>] do_writev+0x64/0x100
[ 4872.503589]   [<ffffffff81282ea0>] SyS_writev+0x10/0x20
[ 4872.509428]   [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 4872.515656]   [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
[ 4872.522563] irq event stamp: 427
[ 4872.526160] hardirqs last  enabled at (427): [<ffffffff817cf21d>] mutex_trylock+0xdd/0x200
[ 4872.535393] hardirqs last disabled at (426): [<ffffffff817cf191>] mutex_trylock+0x51/0x200
[ 4872.544627] softirqs last  enabled at (424): [<ffffffff817d7b37>] __do_softirq+0x1f7/0x4b7
[ 4872.553862] softirqs last disabled at (417): [<ffffffff810a4a98>] irq_exit+0xc8/0xe0
[ 4872.562513] 
[ 4872.562513] other info that might help us debug this:
[ 4872.569797]  Possible unsafe locking scenario:
[ 4872.569797] 
[ 4872.576401]        CPU0
[ 4872.579127]        ----
[ 4872.581854]   lock(&xfs_nondir_ilock_class);
[ 4872.586637]   <Interrupt>
[ 4872.589558]     lock(&xfs_nondir_ilock_class);
[ 4872.594533] 
[ 4872.594533]  *** DEADLOCK ***
[ 4872.594533] 
[ 4872.601140] 3 locks held by kswapd1/437:
[ 4872.605515]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811f78ad>] shrink_slab+0x9d/0x620
[ 4872.614889]  #1:  (&type->s_umount_key#48){++++++}, at: [<ffffffff8128550b>] trylock_super+0x1b/0x50
[ 4872.625145]  #2:  (&pag->pag_ici_reclaim_lock){+.+...}, at: [<ffffffffa028e7a7>] xfs_reclaim_inodes_ag+0xc7/0x4f0 [xfs]
[ 4872.637247] 
[ 4872.637247] stack backtrace:
[ 4872.642109] CPU: 49 PID: 437 Comm: kswapd1 Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 4872.650846] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
[ 4872.662202]  0000000000000086 00000000eda15d18 ffff880462bd7798 ffffffff813d2ecc
[ 4872.670498]  ffff880462e56000 ffffffff82a66870 ffff880462bd77e8 ffffffff811dd9e1
[ 4872.678793]  0000000000000000 ffff880400000001 ffff880400000001 000000000000000a
[ 4872.687086] Call Trace:
[ 4872.689817]  [<ffffffff813d2ecc>] dump_stack+0x85/0xc9
[ 4872.695543]  [<ffffffff811dd9e1>] print_usage_bug+0x1eb/0x1fc
[ 4872.701954]  [<ffffffff810fc0b0>] ? check_usage_backwards+0x150/0x150
[ 4872.709141]  [<ffffffff810fcae4>] mark_lock+0x264/0x2f0
[ 4872.714968]  [<ffffffff810fd491>] __lock_acquire+0x311/0x1990
[ 4872.721379]  [<ffffffff810499db>] ? save_stack_trace+0x2b/0x50
[ 4872.727892]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
[ 4872.734497]  [<ffffffff810ff174>] lock_acquire+0xd4/0x240
[ 4872.740535]  [<ffffffffa029968e>] ? xfs_ilock+0x18e/0x260 [xfs]
[ 4872.747155]  [<ffffffffa028dd93>] ? xfs_reclaim_inode+0x113/0x380 [xfs]
[ 4872.754538]  [<ffffffff810f8bfa>] down_write_nested+0x4a/0x80
[ 4872.760962]  [<ffffffffa029968e>] ? xfs_ilock+0x18e/0x260 [xfs]
[ 4872.767579]  [<ffffffffa029968e>] xfs_ilock+0x18e/0x260 [xfs]
[ 4872.774004]  [<ffffffffa028dd93>] xfs_reclaim_inode+0x113/0x380 [xfs]
[ 4872.781203]  [<ffffffffa028e9ab>] xfs_reclaim_inodes_ag+0x2cb/0x4f0 [xfs]
[ 4872.788780]  [<ffffffffa028e7d2>] ? xfs_reclaim_inodes_ag+0xf2/0x4f0 [xfs]
[ 4872.796453]  [<ffffffff817d40aa>] ? _raw_spin_unlock_irqrestore+0x6a/0x80
[ 4872.804026]  [<ffffffff817d408a>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[ 4872.811602]  [<ffffffff810d1a58>] ? try_to_wake_up+0x58/0x510
[ 4872.818014]  [<ffffffff810d1f25>] ? wake_up_process+0x15/0x20
[ 4872.824438]  [<ffffffffa0290523>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[ 4872.831835]  [<ffffffffa02a26d9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[ 4872.839702]  [<ffffffff812856c1>] super_cache_scan+0x181/0x190
[ 4872.846210]  [<ffffffff811f7a79>] shrink_slab+0x269/0x620
[ 4872.852233]  [<ffffffff811fcc88>] shrink_node+0x108/0x310
[ 4872.858256]  [<ffffffff811fe360>] kswapd+0x3d0/0x960
[ 4872.863796]  [<ffffffff811fdf90>] ? mem_cgroup_shrink_node+0x370/0x370
[ 4872.871081]  [<ffffffff810c3f5e>] kthread+0xfe/0x120
[ 4872.876618]  [<ffffffff817d40ec>] ? _raw_spin_unlock_irq+0x2c/0x60
[ 4872.883514]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 4872.889539]  [<ffffffff810c3e60>] ? kthread_create_on_node+0x230/0x230
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Oct. 6, 2016, 5 p.m. UTC | #4
On Thu, Oct 6, 2016 at 9:11 AM, CAI Qian <caiqian@redhat.com> wrote:
>
>>
>> Wait. There is also a lockep happened before the xfs internal error as well.
> Some other lockdep this time,

This one looks just bogus.

> [ 4872.569797]  Possible unsafe locking scenario:
> [ 4872.569797]
> [ 4872.576401]        CPU0
> [ 4872.579127]        ----
> [ 4872.581854]   lock(&xfs_nondir_ilock_class);
> [ 4872.586637]   <Interrupt>
> [ 4872.589558]     lock(&xfs_nondir_ilock_class);

I'm not seeing that .lock taken in interrupt context.

I'm wondering how many of your reports are confused by earlier errors
that  happened.

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 6, 2016, 6:12 p.m. UTC | #5
----- Original Message -----
> From: "Linus Torvalds" <torvalds@linux-foundation.org>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "Al Viro" <viro@zeniv.linux.org.uk>, "tj" <tj@kernel.org>, "Dave Chinner" <david@fromorbit.com>, "linux-xfs"
> <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>, "linux-fsdevel"
> <linux-fsdevel@vger.kernel.org>
> Sent: Thursday, October 6, 2016 1:00:08 PM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> On Thu, Oct 6, 2016 at 9:11 AM, CAI Qian <caiqian@redhat.com> wrote:
> >
> >>
> >> Wait. There is also a lockep happened before the xfs internal error as
> >> well.
> > Some other lockdep this time,
> 
> This one looks just bogus.
> 
> > [ 4872.569797]  Possible unsafe locking scenario:
> > [ 4872.569797]
> > [ 4872.576401]        CPU0
> > [ 4872.579127]        ----
> > [ 4872.581854]   lock(&xfs_nondir_ilock_class);
> > [ 4872.586637]   <Interrupt>
> > [ 4872.589558]     lock(&xfs_nondir_ilock_class);
> 
> I'm not seeing that .lock taken in interrupt context.
> 
> I'm wondering how many of your reports are confused by earlier errors
> that  happened.
Hmm, there was no previous error/lockdep/warnings on the console prior to
this AFAICT. It was a fresh trinity run after reboot.

The previous run triggered seq_read/__sb_start_write lockdep and
then xfs XFS_WANT_CORRUPTED_RETURN internal error was highlighted
in another reply was also started from a fresh reboot.

After all of those individual runs it will reliably triggered the
cgroup hang from any systemctl command or "make install" of kernel etc.
   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Oct. 7, 2016, 7:08 a.m. UTC | #6
So I believe this may be just a problem in overlayfs lockdep annotation
(see below). Added Miklos to CC.

On Thu 06-10-16 08:25:59, CAI Qian wrote:
> > > > Not sure if this related, and there is always a lockdep regards procfs
> > > > happened
> > > > below unless masking by other lockdep issues before the cgroup hang.
> > > > Also,
> > > > this
> > > > hang is always reproducible.
> > > 
> > > Sigh...  Let's get the /proc/*/auxv out of the way - this should deal with
> > > it:
> > So I applied both this and the sanity patch, and both original sanity and the
> > proc warnings went away. However, the cgroup hang can still be reproduced as
> > well as this new xfs internal error below,
> 
> Wait. There is also a lockep happened before the xfs internal error as well.
> 
> [ 5839.452325] ======================================================
> [ 5839.459221] [ INFO: possible circular locking dependency detected ]
> [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
> [ 5839.471945] -------------------------------------------------------
> [ 5839.478937] trinity-c220/69531 is trying to acquire lock:
> [ 5839.484961]  (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
> [ 5839.492967] 
> but task is already holding lock:
> [ 5839.499476]  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 5839.508560] 
> which lock already depends on the new lock.
> 
> [ 5839.517686] 
> the existing dependency chain (in reverse order) is:
> [ 5839.526036] 
> -> #3 (sb_writers#8){.+.+.+}:
> [ 5839.530751]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> [ 5839.537368]        [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90
> [ 5839.544275]        [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 5839.551181]        [<ffffffff812a8544>] mnt_want_write+0x24/0x50
> [ 5839.557892]        [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30 [overlay]
> [ 5839.565577]        [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480 [overlay]
> [ 5839.573259]        [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay]
> [ 5839.580555]        [<ffffffff812918ea>] vfs_unlink+0xda/0x190
> [ 5839.586979]        [<ffffffff81293698>] do_unlinkat+0x268/0x2b0
> [ 5839.593599]        [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30
> [ 5839.600120]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 5839.606836]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> [ 5839.614231] 

So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection
through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in
I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we
grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then
in ovl_want_write() we grab freeze protection again, but this time for the
upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT
(overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering
(we maintain locking classes per fs type so that's why I'm showing fs type
in parenthesis).

Now this nesting is nasty because once you add locks that are not tracked
per fs type into the mix, you get cycles. In this case we've got
seq_file->lock and cred_guard_mutex into the mix - the splice path is
doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing
from seq_file into the real filesystem). Exec path further establishes
cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle:

sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex
(overlay) -> sb_writers (FS-A)

If I analyzed the lockdep trace, this looks like a real (although remote)
deadlock possibility. Miklos?

								Honza

> -> #2 (&sb->s_type->i_mutex_key#17){++++++}:
> [ 5839.620399]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> [ 5839.627015]        [<ffffffff817d1b77>] down_read+0x47/0x70
> [ 5839.633242]        [<ffffffff8128cfd2>] lookup_slow+0xc2/0x1f0
> [ 5839.639762]        [<ffffffff8128f6f2>] walk_component+0x172/0x220
> [ 5839.646668]        [<ffffffff81290fd6>] link_path_walk+0x1a6/0x620
> [ 5839.653574]        [<ffffffff81291a81>] path_openat+0xe1/0xdb0
> [ 5839.660092]        [<ffffffff812939e1>] do_filp_open+0x91/0x100
> [ 5839.666707]        [<ffffffff81288e06>] do_open_execat+0x76/0x180
> [ 5839.673517]        [<ffffffff81288f3b>] open_exec+0x2b/0x50
> [ 5839.679743]        [<ffffffff812eccf3>] load_elf_binary+0x2a3/0x10a0
> [ 5839.686844]        [<ffffffff81288917>] search_binary_handler+0x97/0x1d0
> [ 5839.694331]        [<ffffffff81289ed8>] do_execveat_common.isra.35+0x678/0x9a0
> [ 5839.702400]        [<ffffffff8128a4da>] SyS_execve+0x3a/0x50
> [ 5839.708726]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 5839.715441]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> [ 5839.722833] 
> -> #1 (&sig->cred_guard_mutex){+.+.+.}:
> [ 5839.728510]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> [ 5839.735126]        [<ffffffff817cfc66>] mutex_lock_killable_nested+0x86/0x540
> [ 5839.743097]        [<ffffffff81301e84>] lock_trace+0x24/0x60
> [ 5839.749421]        [<ffffffff8130224d>] proc_pid_syscall+0x2d/0x110
> [ 5839.756423]        [<ffffffff81302af0>] proc_single_show+0x50/0x90
> [ 5839.763330]        [<ffffffff812ab867>] traverse+0xf7/0x210
> [ 5839.769557]        [<ffffffff812ac9eb>] seq_read+0x39b/0x3e0
> [ 5839.775884]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> [ 5839.783179]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> [ 5839.790181]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
> [ 5839.796406]        [<ffffffff81281c12>] do_preadv+0xa2/0xc0
> [ 5839.802634]        [<ffffffff81282ec1>] SyS_preadv+0x11/0x20
> [ 5839.808963]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 5839.815681]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> [ 5839.823075] 
> -> #0 (&p->lock){+.+.+.}:
> [ 5839.827395]        [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
> [ 5839.834500]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> [ 5839.841115]        [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
> [ 5839.848219]        [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
> [ 5839.854448]        [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
> [ 5839.861451]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> [ 5839.868742]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> [ 5839.875744]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
> [ 5839.881971]        [<ffffffff812bc55a>] default_file_splice_read+0x1aa/0x2c0
> [ 5839.889847]        [<ffffffff812bb913>] do_splice_to+0x73/0x90
> [ 5839.896365]        [<ffffffff812bba1b>] splice_direct_to_actor+0xeb/0x220
> [ 5839.903950]        [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
> [ 5839.910857]        [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
> [ 5839.917470]        [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
> [ 5839.924184]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 5839.930898]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> [ 5839.938286] 
> other info that might help us debug this:
> 
> [ 5839.947217] Chain exists of:
>   &p->lock --> &sb->s_type->i_mutex_key#17 --> sb_writers#8
> 
> [ 5839.956615]  Possible unsafe locking scenario:
> 
> [ 5839.963218]        CPU0                    CPU1
> [ 5839.968269]        ----                    ----
> [ 5839.973321]   lock(sb_writers#8);
> [ 5839.977046]                                lock(&sb->s_type->i_mutex_key#17);
> [ 5839.985037]                                lock(sb_writers#8);
> [ 5839.991573]   lock(&p->lock);
> [ 5839.994900] 
>  *** DEADLOCK ***
> 
> [ 5840.001503] 1 lock held by trinity-c220/69531:
> [ 5840.006457]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 5840.016031] 
> stack backtrace:
> [ 5840.020891] CPU: 12 PID: 69531 Comm: trinity-c220 Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 5840.030306] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
> [ 5840.041660]  0000000000000086 00000000a1ef62f8 ffff8803ca52f7c0 ffffffff813d2ecc
> [ 5840.049952]  ffffffff82a41160 ffffffff82a913e0 ffff8803ca52f800 ffffffff811dd630
> [ 5840.058245]  ffff8803ca52f840 ffff880392c4ecc8 ffff880392c4e000 0000000000000001
> [ 5840.066537] Call Trace:
> [ 5840.069266]  [<ffffffff813d2ecc>] dump_stack+0x85/0xc9
> [ 5840.075000]  [<ffffffff811dd630>] print_circular_bug+0x1f9/0x207
> [ 5840.081701]  [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
> [ 5840.088208]  [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> [ 5840.094232]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> [ 5840.100061]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> [ 5840.105891]  [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
> [ 5840.112397]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> [ 5840.118228]  [<ffffffff810fb3e9>] ? __lock_is_held+0x49/0x70
> [ 5840.124540]  [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
> [ 5840.130175]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> [ 5840.137360]  [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
> [ 5840.143770]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> [ 5840.150956]  [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> [ 5840.157657]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> [ 5840.164843]  [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> [ 5840.171255]  [<ffffffff81418cf9>] ? __pipe_get_pages+0x24/0x9b
> [ 5840.177762]  [<ffffffff813e6f0f>] ? iov_iter_get_pages_alloc+0x19f/0x360
> [ 5840.185240]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
> [ 5840.191843]  [<ffffffff81281a59>] vfs_readv+0x39/0x50
> [ 5840.197478]  [<ffffffff812bc55a>] default_file_splice_read+0x1aa/0x2c0
> [ 5840.204763]  [<ffffffff810cba89>] ? __might_sleep+0x49/0x80
> [ 5840.210980]  [<ffffffff81349c93>] ? security_file_permission+0xa3/0xc0
> [ 5840.218264]  [<ffffffff812bb913>] do_splice_to+0x73/0x90
> [ 5840.224190]  [<ffffffff812bba1b>] splice_direct_to_actor+0xeb/0x220
> [ 5840.231182]  [<ffffffff812baee0>] ? generic_pipe_buf_nosteal+0x10/0x10
> [ 5840.238465]  [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
> [ 5840.244778]  [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
> [ 5840.250802]  [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
> [ 5840.256922]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 5840.263042]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> 
>    CAI Qian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 7, 2016, 9:27 a.m. UTC | #7
On Thu, Oct 06, 2016 at 08:20:17AM -0400, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > From: "Al Viro" <viro@ZenIV.linux.org.uk>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: "tj" <tj@kernel.org>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Dave Chinner" <david@fromorbit.com>,
> > "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> > linux-fsdevel@vger.kernel.org
> > Sent: Wednesday, October 5, 2016 4:05:22 PM
> > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> > 
> > On Wed, Oct 05, 2016 at 02:57:04PM -0400, CAI Qian wrote:
> > 
> > > Not sure if this related, and there is always a lockdep regards procfs
> > > happened
> > > below unless masking by other lockdep issues before the cgroup hang. Also,
> > > this
> > > hang is always reproducible.
> > 
> > Sigh...  Let's get the /proc/*/auxv out of the way - this should deal with
> > it:
> So I applied both this and the sanity patch, and both original sanity and the
> proc warnings went away. However, the cgroup hang can still be reproduced as
> well as this new xfs internal error below,
> 
> [16921.141233] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line 5619 of file fs/xfs/libxfs/xfs_bmap.c.  Caller xfs_bmap_shift_extents+0x1cc/0x3a0 [xfs]
> [16921.157694] CPU: 9 PID: 52920 Comm: trinity-c108 Not tainted 4.8.0-rc8-splice-fixw-proc+ #4

iIt found a delayed allocation extent in the extent map after
flushing all the dirty data in the file. Something else has gone
wrong, this corruption detection is just the messenger. Maybe
memory corruption?

Cheers,

Dave.
Dave Chinner Oct. 7, 2016, 9:57 a.m. UTC | #8
On Thu, Oct 06, 2016 at 10:00:08AM -0700, Linus Torvalds wrote:
> On Thu, Oct 6, 2016 at 9:11 AM, CAI Qian <caiqian@redhat.com> wrote:
> >
> >>
> >> Wait. There is also a lockep happened before the xfs internal error as well.
> > Some other lockdep this time,
> 
> This one looks just bogus.
> 
> > [ 4872.569797]  Possible unsafe locking scenario:
> > [ 4872.569797]
> > [ 4872.576401]        CPU0
> > [ 4872.579127]        ----
> > [ 4872.581854]   lock(&xfs_nondir_ilock_class);
> > [ 4872.586637]   <Interrupt>
> > [ 4872.589558]     lock(&xfs_nondir_ilock_class);
> 
> I'm not seeing that .lock taken in interrupt context.

It's a memory allocation vs reclaim context warning, not a lock
warning. That overloads the lock vs interrupt lockdep mechanism, so
if lockdep sees a context violation it is reported as an "interrupt
context" lock problem.

The allocation context in question is in a function that can be
called from both inside and outside a transaction context. When
outside a transaction, it's a GFP_KERNEL allocation, when inside
it's a GFP_NOFS context.  However, both allocation contexts hold the
inode ilock over the allocation.

the inode shrinker (reclaim context) also happens to take the inode
ilock, and that's what lockdep is complaining about. i.e. it thinks
that this path ilock -> alloc(GFP_KERNEL) -> reclaim -> ilock can
deadlock. But it can't - the ilock held at the upper side is a
referenced inode and can't be seen by reclaim, and the ilocks taken
by reclaim are inodes that can't be seen or referenced by the VFS.

i.e. There's no depedencies between the ilocks on either side of
memory allocation, but there's no way of telling lockdep that short
of giving the inodes in reclaim a different lock class. We used to
do that, but that was a nasty hack and prevented lockdep from
verifying locking orders used on inodes and objects in reclaim
matched the locking orders of referenced inodes...

We've historically shut these false positives up by simply making
all the allocations in these dual context paths GFP_NOFS. However, I
recently got told not to do that by someone on the mm side because
it exacerbated deficiencies in memory reclaim when too many
allocations use GFP_NOFS.

So it's not "fixed" and instead I'm ignoring it.  If you spend any
amount of time running lockdep on XFS you'll get as sick and tired
of playing this whack-a-lockdep-false-positive game as I am.

Cheers,

Dave.
CAI Qian Oct. 7, 2016, 2:43 p.m. UTC | #9
----- Original Message -----
> From: "Jan Kara" <jack@suse.cz>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "Al Viro" <viro@ZenIV.linux.org.uk>, "tj" <tj@kernel.org>, "Linus Torvalds" <torvalds@linux-foundation.org>,
> "Dave Chinner" <david@fromorbit.com>, "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick
> Piggin" <npiggin@gmail.com>, linux-fsdevel@vger.kernel.org, "Miklos Szeredi" <miklos@szeredi.hu>
> Sent: Friday, October 7, 2016 3:08:38 AM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> 
> So I believe this may be just a problem in overlayfs lockdep annotation
> (see below). Added Miklos to CC.
> 
> On Thu 06-10-16 08:25:59, CAI Qian wrote:
> > > > > Not sure if this related, and there is always a lockdep regards
> > > > > procfs
> > > > > happened
> > > > > below unless masking by other lockdep issues before the cgroup hang.
> > > > > Also,
> > > > > this
> > > > > hang is always reproducible.
> > > > 
> > > > Sigh...  Let's get the /proc/*/auxv out of the way - this should deal
> > > > with
> > > > it:
> > > So I applied both this and the sanity patch, and both original sanity and
> > > the
> > > proc warnings went away. However, the cgroup hang can still be reproduced
> > > as
> > > well as this new xfs internal error below,
> > 
> > Wait. There is also a lockep happened before the xfs internal error as
> > well.
> > 
> > [ 5839.452325] ======================================================
> > [ 5839.459221] [ INFO: possible circular locking dependency detected ]
> > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
> > [ 5839.471945] -------------------------------------------------------
> > [ 5839.478937] trinity-c220/69531 is trying to acquire lock:
> > [ 5839.484961]  (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>]
> > seq_read+0x4c/0x3e0
> > [ 5839.492967]
> > but task is already holding lock:
> > [ 5839.499476]  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>]
> > __sb_start_write+0xd1/0xf0
> > [ 5839.508560]
> > which lock already depends on the new lock.
> > 
> > [ 5839.517686]
> > the existing dependency chain (in reverse order) is:
> > [ 5839.526036]
> > -> #3 (sb_writers#8){.+.+.+}:
> > [ 5839.530751]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5839.537368]        [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90
> > [ 5839.544275]        [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> > [ 5839.551181]        [<ffffffff812a8544>] mnt_want_write+0x24/0x50
> > [ 5839.557892]        [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30
> > [overlay]
> > [ 5839.565577]        [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480
> > [overlay]
> > [ 5839.573259]        [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay]
> > [ 5839.580555]        [<ffffffff812918ea>] vfs_unlink+0xda/0x190
> > [ 5839.586979]        [<ffffffff81293698>] do_unlinkat+0x268/0x2b0
> > [ 5839.593599]        [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30
> > [ 5839.600120]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5839.606836]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> > [ 5839.614231]
> 
> So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection
> through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in
> I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we
> grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then
> in ovl_want_write() we grab freeze protection again, but this time for the
> upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT
> (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering
> (we maintain locking classes per fs type so that's why I'm showing fs type
> in parenthesis).
> 
> Now this nesting is nasty because once you add locks that are not tracked
> per fs type into the mix, you get cycles. In this case we've got
> seq_file->lock and cred_guard_mutex into the mix - the splice path is
> doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing
> from seq_file into the real filesystem). Exec path further establishes
> cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle:
> 
> sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex
> (overlay) -> sb_writers (FS-A)
> 
> If I analyzed the lockdep trace, this looks like a real (although remote)
> deadlock possibility. Miklos?
> 
> 								Honza
> 
> > -> #2 (&sb->s_type->i_mutex_key#17){++++++}:
> > [ 5839.620399]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5839.627015]        [<ffffffff817d1b77>] down_read+0x47/0x70
> > [ 5839.633242]        [<ffffffff8128cfd2>] lookup_slow+0xc2/0x1f0
> > [ 5839.639762]        [<ffffffff8128f6f2>] walk_component+0x172/0x220
> > [ 5839.646668]        [<ffffffff81290fd6>] link_path_walk+0x1a6/0x620
> > [ 5839.653574]        [<ffffffff81291a81>] path_openat+0xe1/0xdb0
> > [ 5839.660092]        [<ffffffff812939e1>] do_filp_open+0x91/0x100
> > [ 5839.666707]        [<ffffffff81288e06>] do_open_execat+0x76/0x180
> > [ 5839.673517]        [<ffffffff81288f3b>] open_exec+0x2b/0x50
> > [ 5839.679743]        [<ffffffff812eccf3>] load_elf_binary+0x2a3/0x10a0
> > [ 5839.686844]        [<ffffffff81288917>] search_binary_handler+0x97/0x1d0
> > [ 5839.694331]        [<ffffffff81289ed8>]
> > do_execveat_common.isra.35+0x678/0x9a0
> > [ 5839.702400]        [<ffffffff8128a4da>] SyS_execve+0x3a/0x50
> > [ 5839.708726]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5839.715441]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> > [ 5839.722833]
> > -> #1 (&sig->cred_guard_mutex){+.+.+.}:
> > [ 5839.728510]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5839.735126]        [<ffffffff817cfc66>]
> > mutex_lock_killable_nested+0x86/0x540
> > [ 5839.743097]        [<ffffffff81301e84>] lock_trace+0x24/0x60
> > [ 5839.749421]        [<ffffffff8130224d>] proc_pid_syscall+0x2d/0x110
> > [ 5839.756423]        [<ffffffff81302af0>] proc_single_show+0x50/0x90
> > [ 5839.763330]        [<ffffffff812ab867>] traverse+0xf7/0x210
> > [ 5839.769557]        [<ffffffff812ac9eb>] seq_read+0x39b/0x3e0
> > [ 5839.775884]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> > [ 5839.783179]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> > [ 5839.790181]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
> > [ 5839.796406]        [<ffffffff81281c12>] do_preadv+0xa2/0xc0
> > [ 5839.802634]        [<ffffffff81282ec1>] SyS_preadv+0x11/0x20
> > [ 5839.808963]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5839.815681]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> > [ 5839.823075]
> > -> #0 (&p->lock){+.+.+.}:
> > [ 5839.827395]        [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
> > [ 5839.834500]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5839.841115]        [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
> > [ 5839.848219]        [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
> > [ 5839.854448]        [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
> > [ 5839.861451]        [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> > [ 5839.868742]        [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> > [ 5839.875744]        [<ffffffff81281a59>] vfs_readv+0x39/0x50
> > [ 5839.881971]        [<ffffffff812bc55a>]
> > default_file_splice_read+0x1aa/0x2c0
> > [ 5839.889847]        [<ffffffff812bb913>] do_splice_to+0x73/0x90
> > [ 5839.896365]        [<ffffffff812bba1b>]
> > splice_direct_to_actor+0xeb/0x220
> > [ 5839.903950]        [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
> > [ 5839.910857]        [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
> > [ 5839.917470]        [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
> > [ 5839.924184]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5839.930898]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> > [ 5839.938286]
> > other info that might help us debug this:
> > 
> > [ 5839.947217] Chain exists of:
> >   &p->lock --> &sb->s_type->i_mutex_key#17 --> sb_writers#8
> > 
> > [ 5839.956615]  Possible unsafe locking scenario:
> > 
> > [ 5839.963218]        CPU0                    CPU1
> > [ 5839.968269]        ----                    ----
> > [ 5839.973321]   lock(sb_writers#8);
> > [ 5839.977046]
> > lock(&sb->s_type->i_mutex_key#17);
> > [ 5839.985037]                                lock(sb_writers#8);
> > [ 5839.991573]   lock(&p->lock);
> > [ 5839.994900]
> >  *** DEADLOCK ***
> > 
> > [ 5840.001503] 1 lock held by trinity-c220/69531:
> > [ 5840.006457]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>]
> > __sb_start_write+0xd1/0xf0
> > [ 5840.016031]
> > stack backtrace:
> > [ 5840.020891] CPU: 12 PID: 69531 Comm: trinity-c220 Not tainted
> > 4.8.0-rc8-splice-fixw-proc+ #4
> > [ 5840.030306] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS
> > GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
> > [ 5840.041660]  0000000000000086 00000000a1ef62f8 ffff8803ca52f7c0
> > ffffffff813d2ecc
> > [ 5840.049952]  ffffffff82a41160 ffffffff82a913e0 ffff8803ca52f800
> > ffffffff811dd630
> > [ 5840.058245]  ffff8803ca52f840 ffff880392c4ecc8 ffff880392c4e000
> > 0000000000000001
> > [ 5840.066537] Call Trace:
> > [ 5840.069266]  [<ffffffff813d2ecc>] dump_stack+0x85/0xc9
> > [ 5840.075000]  [<ffffffff811dd630>] print_circular_bug+0x1f9/0x207
> > [ 5840.081701]  [<ffffffff810fe69c>] __lock_acquire+0x151c/0x1990
> > [ 5840.088208]  [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5840.094232]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> > [ 5840.100061]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> > [ 5840.105891]  [<ffffffff817cf3b6>] mutex_lock_nested+0x76/0x450
> > [ 5840.112397]  [<ffffffff812ac69c>] ? seq_read+0x4c/0x3e0
> > [ 5840.118228]  [<ffffffff810fb3e9>] ? __lock_is_held+0x49/0x70
> > [ 5840.124540]  [<ffffffff812ac69c>] seq_read+0x4c/0x3e0
> > [ 5840.130175]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> > [ 5840.137360]  [<ffffffff8131566b>] kernfs_fop_read+0x12b/0x1b0
> > [ 5840.143770]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> > [ 5840.150956]  [<ffffffff81280573>] do_loop_readv_writev+0x83/0xc0
> > [ 5840.157657]  [<ffffffff81315540>] ? kernfs_vma_page_mkwrite+0x90/0x90
> > [ 5840.164843]  [<ffffffff81281a03>] do_readv_writev+0x213/0x230
> > [ 5840.171255]  [<ffffffff81418cf9>] ? __pipe_get_pages+0x24/0x9b
> > [ 5840.177762]  [<ffffffff813e6f0f>] ? iov_iter_get_pages_alloc+0x19f/0x360
> > [ 5840.185240]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
> > [ 5840.191843]  [<ffffffff81281a59>] vfs_readv+0x39/0x50
> > [ 5840.197478]  [<ffffffff812bc55a>] default_file_splice_read+0x1aa/0x2c0
> > [ 5840.204763]  [<ffffffff810cba89>] ? __might_sleep+0x49/0x80
> > [ 5840.210980]  [<ffffffff81349c93>] ? security_file_permission+0xa3/0xc0
> > [ 5840.218264]  [<ffffffff812bb913>] do_splice_to+0x73/0x90
> > [ 5840.224190]  [<ffffffff812bba1b>] splice_direct_to_actor+0xeb/0x220
> > [ 5840.231182]  [<ffffffff812baee0>] ? generic_pipe_buf_nosteal+0x10/0x10
> > [ 5840.238465]  [<ffffffff812bbbd9>] do_splice_direct+0x89/0xd0
> > [ 5840.244778]  [<ffffffff8128261e>] do_sendfile+0x1ce/0x3b0
> > [ 5840.250802]  [<ffffffff812831df>] SyS_sendfile64+0x6f/0xd0
> > [ 5840.256922]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5840.263042]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
Hmm, this round of trinity triggered a different hang.

[ 2094.403119] INFO: task trinity-c0:3126 blocked for more than 120 seconds.
[ 2094.410705]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2094.417027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2094.425770] trinity-c0      D ffff88044efc3d10 13472  3126   3124 0x00000084
[ 2094.433659]  ffff88044efc3d10 ffffffff00000000 ffff880400000000 ffff880822b5e000
[ 2094.441965]  ffff88044c8b8000 ffff88044efc4000 ffff880443755670 ffff880443755658
[ 2094.450272]  ffffffff00000000 ffff88044c8b8000 ffff88044efc3d28 ffffffff817cdaaf
[ 2094.458572] Call Trace:
[ 2094.461312]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2094.466858]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
[ 2094.474049]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
[ 2094.481352]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
[ 2094.487964]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
[ 2094.495450]  [<ffffffff817d1bff>] down_write+0x5f/0x80
[ 2094.501190]  [<ffffffff8127e301>] ? chown_common.isra.12+0x131/0x1e0
[ 2094.508284]  [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0
[ 2094.515177]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[ 2094.521692]  [<ffffffff810cc367>] ? preempt_count_add+0x47/0xc0
[ 2094.528304]  [<ffffffff812a665f>] ? mnt_clone_write+0x3f/0x70
[ 2094.534723]  [<ffffffff8127faef>] SyS_fchown+0x8f/0xa0
[ 2094.540463]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2094.546588]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2094.553784] 2 locks held by trinity-c0/3126:
[ 2094.558552]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 2094.568240]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at: [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0
[ 2094.579864] INFO: task trinity-c1:3127 blocked for more than 120 seconds.
[ 2094.587442]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2094.593761] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2094.602503] trinity-c1      D ffff88045a1bbd10 13312  3127   3124 0x00000084
[ 2094.610402]  ffff88045a1bbd10 ffff880443769fe8 ffff880400000000 ffff88046cefe000
[ 2094.618710]  ffff88044c8ba000 ffff88045a1bc000 ffff880443769fd0 ffff88045a1bbd40
[ 2094.627015]  ffff880443769fe8 ffff88044376a158 ffff88045a1bbd28 ffffffff817cdaaf
[ 2094.635321] Call Trace:
[ 2094.638053]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2094.643597]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[ 2094.650726]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2094.657727]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[ 2094.665119]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[ 2094.671457]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[ 2094.677987]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2094.684324]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2094.691133]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
[ 2094.697354]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
[ 2094.702896]  [<ffffffff812bdf40>] SyS_fsync+0x10/0x20
[ 2094.708528]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2094.714652]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2094.721844] 1 lock held by trinity-c1/3127:
[ 2094.726515]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2094.737181] INFO: task trinity-c2:3128 blocked for more than 120 seconds.
[ 2094.744751]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2094.751068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2094.759810] trinity-c2      D ffff8804574f3df8 13472  3128   3124 0x00000084
[ 2094.767692]  ffff8804574f3df8 0000000000000006 0000000000000000 ffff8804569a4000
[ 2094.776002]  ffff88044c8bc000 ffff8804574f4000 ffff8804622eb338 ffff88044c8bc000
[ 2094.784307]  0000000000000246 00000000ffffffff ffff8804574f3e10 ffffffff817cdaaf
[ 2094.792605] Call Trace:
[ 2094.795340]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2094.800886]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[ 2094.808078]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[ 2094.814688]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[ 2094.820715]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2094.826544]  [<ffffffff81297f53>] SyS_getdents+0x83/0x140
[ 2094.832573]  [<ffffffff81297cd0>] ? fillonedir+0x100/0x100
[ 2094.838699]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2094.844822]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2094.852013] 1 lock held by trinity-c2/3128:
[ 2094.856682]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2094.865969] INFO: task trinity-c3:3129 blocked for more than 120 seconds.
[ 2094.873547]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2094.879864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2094.888606] trinity-c3      D ffff880455ce3e08 13440  3129   3124 0x00000084
[ 2094.896495]  ffff880455ce3e08 0000000000000006 0000000000000000 ffff88045144e000
[ 2094.904803]  ffff88044c8be000 ffff880455ce4000 ffff8804622eb338 ffff88044c8be000
[ 2094.913111]  0000000000000246 00000000ffffffff ffff880455ce3e20 ffffffff817cdaaf
[ 2094.921418] Call Trace:
[ 2094.924152]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2094.929695]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[ 2094.936885]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[ 2094.943496]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[ 2094.949526]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[ 2094.956620]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2094.962454]  [<ffffffff81298091>] SyS_getdents64+0x81/0x130
[ 2094.968675]  [<ffffffff81297a80>] ? iterate_dir+0x190/0x190
[ 2094.974895]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2094.981019]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2094.988204] 1 lock held by trinity-c3/3129:
[ 2094.992872]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2095.002158] INFO: task trinity-c4:3130 blocked for more than 120 seconds.
[ 2095.009734]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.016052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.024793] trinity-c4      D ffff880458997e28 13392  3130   3124 0x00000084
[ 2095.032690]  ffff880458997e28 0000000000000006 0000000000000000 ffff88046ca18000
[ 2095.040995]  ffff880458998000 ffff880458998000 ffff8804622eb338 ffff880458998000
[ 2095.049342]  0000000000000246 00000000ffffffff ffff880458997e40 ffffffff817cdaaf
[ 2095.057650] Call Trace:
[ 2095.060382]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.065926]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[ 2095.073118]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[ 2095.079728]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[ 2095.085757]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2095.091589]  [<ffffffff812811dd>] SyS_lseek+0x1d/0xb0
[ 2095.097229]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2095.103355]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2095.110547] 1 lock held by trinity-c4/3130:
[ 2095.115216]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[ 2095.124507] INFO: task trinity-c5:3131 blocked for more than 120 seconds.
[ 2095.132083]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.138402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.147135] trinity-c5      D ffff88045a12bae0 13472  3131   3124 0x00000084
[ 2095.155034]  ffff88045a12bae0 ffff880443769fe8 ffff880400000000 ffff88046ca1a000
[ 2095.163339]  ffff88045899a000 ffff88045a12c000 ffff880443769fd0 ffff88045a12bb10
[ 2095.171645]  ffff880443769fe8 0000000000000000 ffff88045a12baf8 ffffffff817cdaaf
[ 2095.179952] Call Trace:
[ 2095.182684]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.188230]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[ 2095.195341]  [<ffffffffa03337d4>] ? xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[ 2095.203310]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[ 2095.210696]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[ 2095.217029]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[ 2095.223558]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.229894]  [<ffffffffa03337d4>] xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[ 2095.237682]  [<ffffffffa02ccfaf>] xfs_attr_get+0xdf/0x1b0 [xfs]
[ 2095.244312]  [<ffffffffa0341bfc>] xfs_xattr_get+0x4c/0x70 [xfs]
[ 2095.250924]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
[ 2095.257244]  [<ffffffff812acf9b>] vfs_getxattr+0x8b/0xb0
[ 2095.263177]  [<ffffffffa0435bd6>] ovl_xattr_get+0x46/0x60 [overlay]
[ 2095.270176]  [<ffffffffa04331aa>] ovl_other_xattr_get+0x1a/0x20 [overlay]
[ 2095.277756]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
[ 2095.284079]  [<ffffffff81345e9e>] cap_inode_need_killpriv+0x2e/0x40
[ 2095.291078]  [<ffffffff81349a33>] security_inode_need_killpriv+0x33/0x50
[ 2095.298560]  [<ffffffff812a2fb0>] dentry_needs_remove_privs+0x30/0x50
[ 2095.305743]  [<ffffffff8127ea21>] do_truncate+0x51/0xc0
[ 2095.311581]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[ 2095.318094]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[ 2095.324609]  [<ffffffff8127edde>] do_sys_ftruncate.constprop.15+0xfe/0x160
[ 2095.332286]  [<ffffffff8127ee7e>] SyS_ftruncate+0xe/0x10
[ 2095.338225]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2095.344339]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2095.351531] 2 locks held by trinity-c5/3131:
[ 2095.356297]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 2095.365983]  #1:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.376647] INFO: task trinity-c6:3132 blocked for more than 120 seconds.
[ 2095.384216]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.390535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.399275] trinity-c6      D ffff88044da5fd30 13312  3132   3124 0x00000084
[ 2095.407177]  ffff88044da5fd30 ffffffff00000000 ffff880400000000 ffff880459858000
[ 2095.415485]  ffff88045899c000 ffff88044da60000 ffff880443755670 ffff880443755658
[ 2095.423789]  ffffffff00000000 ffff88045899c000 ffff88044da5fd48 ffffffff817cdaaf
[ 2095.432094] Call Trace:
[ 2095.434825]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.440372]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
[ 2095.447565]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
[ 2095.454854]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
[ 2095.462337]  [<ffffffff817d1bff>] down_write+0x5f/0x80
[ 2095.468077]  [<ffffffff8127e413>] ? chmod_common+0x63/0x150
[ 2095.474300]  [<ffffffff8127e413>] chmod_common+0x63/0x150
[ 2095.480327]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[ 2095.487421]  [<ffffffff810035cc>] ? syscall_trace_enter+0x1dc/0x390
[ 2095.494418]  [<ffffffff8127f5f2>] SyS_fchmod+0x52/0x80
[ 2095.500155]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2095.506270]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2095.513452] 2 locks held by trinity-c6/3132:
[ 2095.518217]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[ 2095.527895]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at: [<ffffffff8127e413>] chmod_common+0x63/0x150
[ 2095.538648] INFO: task trinity-c7:3133 blocked for more than 120 seconds.
[ 2095.546227]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.552544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.561288] trinity-c7      D ffff88044d393d10 13472  3133   3124 0x00000084
[ 2095.569188]  ffff88044d393d10 ffff880443769fe8 ffff880400000000 ffff88086ce68000
[ 2095.577491]  ffff88045899e000 ffff88044d394000 ffff880443769fd0 ffff88044d393d40
[ 2095.585796]  ffff880443769fe8 ffff88044376a158 ffff88044d393d28 ffffffff817cdaaf
[ 2095.594103] Call Trace:
[ 2095.596836]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.602379]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[ 2095.609491]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2095.616490]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[ 2095.623877]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[ 2095.630212]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[ 2095.636740]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.643076]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2095.649889]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
[ 2095.656109]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
[ 2095.661653]  [<ffffffff812bdf40>] SyS_fsync+0x10/0x20
[ 2095.667291]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2095.673417]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2095.680610] 1 lock held by trinity-c7/3133:
[ 2095.685281]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.695947] INFO: task trinity-c8:3135 blocked for more than 120 seconds.
[ 2095.703530]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.709848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.718590] trinity-c8      D ffff88044d3b3d10 12912  3135   3124 0x00000084
[ 2095.726470]  ffff88044d3b3d10 ffff880443769fe8 ffff880400000000 ffff88046ca30000
[ 2095.734775]  ffff88044d3a8000 ffff88044d3b4000 ffff880443769fd0 ffff88044d3b3d40
[ 2095.743083]  ffff880443769fe8 ffff88044376a158 ffff88044d3b3d28 ffffffff817cdaaf
[ 2095.751387] Call Trace:
[ 2095.754119]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.759662]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[ 2095.766772]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2095.773763]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[ 2095.781148]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[ 2095.787482]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[ 2095.794013]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.800347]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
[ 2095.807155]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
[ 2095.813377]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
[ 2095.818921]  [<ffffffff812bdf63>] SyS_fdatasync+0x13/0x20
[ 2095.824949]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2095.831074]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2095.838261] 1 lock held by trinity-c8/3135:
[ 2095.842930]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.853588] INFO: task trinity-c9:3136 blocked for more than 120 seconds.
[ 2095.861167]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[ 2095.867485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2095.876228] trinity-c9      D ffff88045b3679e0 13328  3136   3124 0x00000084
[ 2095.884111]  ffff88045b3679e0 ffff880443769fe8 ffff880400000000 ffff88086ce56000
[ 2095.892417]  ffff88044d3aa000 ffff88045b368000 ffff880443769fd0 ffff88045b367a10
[ 2095.900721]  ffff880443769fe8 ffff88044376a1e8 ffff88045b3679f8 ffffffff817cdaaf
[ 2095.909024] Call Trace:
[ 2095.911761]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 2095.917305]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[ 2095.924414]  [<ffffffffa0333790>] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs]
[ 2095.932383]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[ 2095.939768]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[ 2095.946104]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[ 2095.952632]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
[ 2095.958968]  [<ffffffffa0333790>] xfs_ilock_data_map_shared+0x30/0x40 [xfs]
[ 2095.966752]  [<ffffffffa03128c6>] __xfs_get_blocks+0x96/0x9d0 [xfs]
[ 2095.973753]  [<ffffffff8126462e>] ? mem_cgroup_event_ratelimit.isra.39+0x3e/0xb0
[ 2095.982012]  [<ffffffff8126e8e5>] ? mem_cgroup_commit_charge+0x95/0x110
[ 2095.989413]  [<ffffffffa0313214>] xfs_get_blocks+0x14/0x20 [xfs]
[ 2095.996122]  [<ffffffff812cca44>] do_mpage_readpage+0x474/0x800
[ 2096.002745]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
[ 2096.010037]  [<ffffffff81402fd7>] ? debug_smp_processor_id+0x17/0x20
[ 2096.017136]  [<ffffffff811f3565>] ? __lru_cache_add+0x75/0xb0
[ 2096.023551]  [<ffffffff811f45fe>] ? lru_cache_add+0xe/0x10
[ 2096.029678]  [<ffffffff812ccf0d>] mpage_readpages+0x13d/0x1b0
[ 2096.036109]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
[ 2096.043420]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
[ 2096.050724]  [<ffffffffa0311f14>] xfs_vm_readpages+0x54/0x170 [xfs]
[ 2096.057724]  [<ffffffff811f1a1d>] __do_page_cache_readahead+0x2ad/0x370
[ 2096.065113]  [<ffffffff811f18ec>] ? __do_page_cache_readahead+0x17c/0x370
[ 2096.072693]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[ 2096.079787]  [<ffffffff811f2014>] force_page_cache_readahead+0x94/0xf0
[ 2096.087077]  [<ffffffff811f2168>] SyS_readahead+0xa8/0xc0
[ 2096.093106]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[ 2096.099234]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2096.106427] 1 lock held by trinity-c9/3136:
[ 2096.111097]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Oct. 7, 2016, 3:25 p.m. UTC | #10
On Fri, Oct 7, 2016 at 2:57 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> So it's not "fixed" and instead I'm ignoring it.  If you spend any
> amount of time running lockdep on XFS you'll get as sick and tired
> of playing this whack-a-lockdep-false-positive game as I am.

Thanks for the background here. I'll try to remember it for the next
time this comes up, it doesn't help that lockdep reports are often a
bit cryptic to begin with (that "interrupt" thing certainly didn't
help).

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 7, 2016, 3:27 p.m. UTC | #11
> Hmm, this round of trinity triggered a different hang.
This hang is reproducible so far with the command below on a overlayfs/xfs,

$ trinity -g vfs --arch 64 --disable-fds=sockets --disable-fds=perf --disable-fds=epoll
  --disable-fds=eventfd --disable-fds=pseudo --disable-fds=timerfd --disable-fds=memfd
  --disable-fds=drm
> 
> [ 2094.403119] INFO: task trinity-c0:3126 blocked for more than 120 seconds.
> [ 2094.410705]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2094.417027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2094.425770] trinity-c0      D ffff88044efc3d10 13472  3126   3124
> 0x00000084
> [ 2094.433659]  ffff88044efc3d10 ffffffff00000000 ffff880400000000
> ffff880822b5e000
> [ 2094.441965]  ffff88044c8b8000 ffff88044efc4000 ffff880443755670
> ffff880443755658
> [ 2094.450272]  ffffffff00000000 ffff88044c8b8000 ffff88044efc3d28
> ffffffff817cdaaf
> [ 2094.458572] Call Trace:
> [ 2094.461312]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2094.466858]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
> [ 2094.474049]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
> [ 2094.481352]  [<ffffffff810fd5f2>] ? __lock_acquire+0x472/0x1990
> [ 2094.487964]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
> [ 2094.495450]  [<ffffffff817d1bff>] down_write+0x5f/0x80
> [ 2094.501190]  [<ffffffff8127e301>] ? chown_common.isra.12+0x131/0x1e0
> [ 2094.508284]  [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0
> [ 2094.515177]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
> [ 2094.521692]  [<ffffffff810cc367>] ? preempt_count_add+0x47/0xc0
> [ 2094.528304]  [<ffffffff812a665f>] ? mnt_clone_write+0x3f/0x70
> [ 2094.534723]  [<ffffffff8127faef>] SyS_fchown+0x8f/0xa0
> [ 2094.540463]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2094.546588]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2094.553784] 2 locks held by trinity-c0/3126:
> [ 2094.558552]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>]
> __sb_start_write+0xd1/0xf0
> [ 2094.568240]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at:
> [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0
> [ 2094.579864] INFO: task trinity-c1:3127 blocked for more than 120 seconds.
> [ 2094.587442]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2094.593761] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2094.602503] trinity-c1      D ffff88045a1bbd10 13312  3127   3124
> 0x00000084
> [ 2094.610402]  ffff88045a1bbd10 ffff880443769fe8 ffff880400000000
> ffff88046cefe000
> [ 2094.618710]  ffff88044c8ba000 ffff88045a1bc000 ffff880443769fd0
> ffff88045a1bbd40
> [ 2094.627015]  ffff880443769fe8 ffff88044376a158 ffff88045a1bbd28
> ffffffff817cdaaf
> [ 2094.635321] Call Trace:
> [ 2094.638053]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2094.643597]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2094.650726]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2094.657727]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2094.665119]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2094.671457]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
> [ 2094.677987]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2094.684324]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2094.691133]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2094.697354]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
> [ 2094.702896]  [<ffffffff812bdf40>] SyS_fsync+0x10/0x20
> [ 2094.708528]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2094.714652]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2094.721844] 1 lock held by trinity-c1/3127:
> [ 2094.726515]  #0:  (&xfs_nondir_ilock_class){++++..}, at:
> [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2094.737181] INFO: task trinity-c2:3128 blocked for more than 120 seconds.
> [ 2094.744751]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2094.751068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2094.759810] trinity-c2      D ffff8804574f3df8 13472  3128   3124
> 0x00000084
> [ 2094.767692]  ffff8804574f3df8 0000000000000006 0000000000000000
> ffff8804569a4000
> [ 2094.776002]  ffff88044c8bc000 ffff8804574f4000 ffff8804622eb338
> ffff88044c8bc000
> [ 2094.784307]  0000000000000246 00000000ffffffff ffff8804574f3e10
> ffffffff817cdaaf
> [ 2094.792605] Call Trace:
> [ 2094.795340]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2094.800886]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
> [ 2094.808078]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2094.814688]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
> [ 2094.820715]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
> [ 2094.826544]  [<ffffffff81297f53>] SyS_getdents+0x83/0x140
> [ 2094.832573]  [<ffffffff81297cd0>] ? fillonedir+0x100/0x100
> [ 2094.838699]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2094.844822]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2094.852013] 1 lock held by trinity-c2/3128:
> [ 2094.856682]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>]
> __fdget_pos+0x43/0x50
> [ 2094.865969] INFO: task trinity-c3:3129 blocked for more than 120 seconds.
> [ 2094.873547]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2094.879864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2094.888606] trinity-c3      D ffff880455ce3e08 13440  3129   3124
> 0x00000084
> [ 2094.896495]  ffff880455ce3e08 0000000000000006 0000000000000000
> ffff88045144e000
> [ 2094.904803]  ffff88044c8be000 ffff880455ce4000 ffff8804622eb338
> ffff88044c8be000
> [ 2094.913111]  0000000000000246 00000000ffffffff ffff880455ce3e20
> ffffffff817cdaaf
> [ 2094.921418] Call Trace:
> [ 2094.924152]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2094.929695]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
> [ 2094.936885]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2094.943496]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
> [ 2094.949526]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
> [ 2094.956620]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
> [ 2094.962454]  [<ffffffff81298091>] SyS_getdents64+0x81/0x130
> [ 2094.968675]  [<ffffffff81297a80>] ? iterate_dir+0x190/0x190
> [ 2094.974895]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2094.981019]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2094.988204] 1 lock held by trinity-c3/3129:
> [ 2094.992872]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>]
> __fdget_pos+0x43/0x50
> [ 2095.002158] INFO: task trinity-c4:3130 blocked for more than 120 seconds.
> [ 2095.009734]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.016052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.024793] trinity-c4      D ffff880458997e28 13392  3130   3124
> 0x00000084
> [ 2095.032690]  ffff880458997e28 0000000000000006 0000000000000000
> ffff88046ca18000
> [ 2095.040995]  ffff880458998000 ffff880458998000 ffff8804622eb338
> ffff880458998000
> [ 2095.049342]  0000000000000246 00000000ffffffff ffff880458997e40
> ffffffff817cdaaf
> [ 2095.057650] Call Trace:
> [ 2095.060382]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.065926]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
> [ 2095.073118]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2095.079728]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
> [ 2095.085757]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
> [ 2095.091589]  [<ffffffff812811dd>] SyS_lseek+0x1d/0xb0
> [ 2095.097229]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.103355]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.110547] 1 lock held by trinity-c4/3130:
> [ 2095.115216]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>]
> __fdget_pos+0x43/0x50
> [ 2095.124507] INFO: task trinity-c5:3131 blocked for more than 120 seconds.
> [ 2095.132083]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.138402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.147135] trinity-c5      D ffff88045a12bae0 13472  3131   3124
> 0x00000084
> [ 2095.155034]  ffff88045a12bae0 ffff880443769fe8 ffff880400000000
> ffff88046ca1a000
> [ 2095.163339]  ffff88045899a000 ffff88045a12c000 ffff880443769fd0
> ffff88045a12bb10
> [ 2095.171645]  ffff880443769fe8 0000000000000000 ffff88045a12baf8
> ffffffff817cdaaf
> [ 2095.179952] Call Trace:
> [ 2095.182684]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.188230]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.195341]  [<ffffffffa03337d4>] ? xfs_ilock_attr_map_shared+0x34/0x40
> [xfs]
> [ 2095.203310]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2095.210696]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2095.217029]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.223558]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.229894]  [<ffffffffa03337d4>] xfs_ilock_attr_map_shared+0x34/0x40
> [xfs]
> [ 2095.237682]  [<ffffffffa02ccfaf>] xfs_attr_get+0xdf/0x1b0 [xfs]
> [ 2095.244312]  [<ffffffffa0341bfc>] xfs_xattr_get+0x4c/0x70 [xfs]
> [ 2095.250924]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
> [ 2095.257244]  [<ffffffff812acf9b>] vfs_getxattr+0x8b/0xb0
> [ 2095.263177]  [<ffffffffa0435bd6>] ovl_xattr_get+0x46/0x60 [overlay]
> [ 2095.270176]  [<ffffffffa04331aa>] ovl_other_xattr_get+0x1a/0x20 [overlay]
> [ 2095.277756]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
> [ 2095.284079]  [<ffffffff81345e9e>] cap_inode_need_killpriv+0x2e/0x40
> [ 2095.291078]  [<ffffffff81349a33>] security_inode_need_killpriv+0x33/0x50
> [ 2095.298560]  [<ffffffff812a2fb0>] dentry_needs_remove_privs+0x30/0x50
> [ 2095.305743]  [<ffffffff8127ea21>] do_truncate+0x51/0xc0
> [ 2095.311581]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
> [ 2095.318094]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
> [ 2095.324609]  [<ffffffff8127edde>] do_sys_ftruncate.constprop.15+0xfe/0x160
> [ 2095.332286]  [<ffffffff8127ee7e>] SyS_ftruncate+0xe/0x10
> [ 2095.338225]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.344339]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.351531] 2 locks held by trinity-c5/3131:
> [ 2095.356297]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>]
> __sb_start_write+0xd1/0xf0
> [ 2095.365983]  #1:  (&xfs_nondir_ilock_class){++++..}, at:
> [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.376647] INFO: task trinity-c6:3132 blocked for more than 120 seconds.
> [ 2095.384216]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.390535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.399275] trinity-c6      D ffff88044da5fd30 13312  3132   3124
> 0x00000084
> [ 2095.407177]  ffff88044da5fd30 ffffffff00000000 ffff880400000000
> ffff880459858000
> [ 2095.415485]  ffff88045899c000 ffff88044da60000 ffff880443755670
> ffff880443755658
> [ 2095.423789]  ffffffff00000000 ffff88045899c000 ffff88044da5fd48
> ffffffff817cdaaf
> [ 2095.432094] Call Trace:
> [ 2095.434825]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.440372]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
> [ 2095.447565]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
> [ 2095.454854]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
> [ 2095.462337]  [<ffffffff817d1bff>] down_write+0x5f/0x80
> [ 2095.468077]  [<ffffffff8127e413>] ? chmod_common+0x63/0x150
> [ 2095.474300]  [<ffffffff8127e413>] chmod_common+0x63/0x150
> [ 2095.480327]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
> [ 2095.487421]  [<ffffffff810035cc>] ? syscall_trace_enter+0x1dc/0x390
> [ 2095.494418]  [<ffffffff8127f5f2>] SyS_fchmod+0x52/0x80
> [ 2095.500155]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.506270]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.513452] 2 locks held by trinity-c6/3132:
> [ 2095.518217]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>]
> __sb_start_write+0xd1/0xf0
> [ 2095.527895]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at:
> [<ffffffff8127e413>] chmod_common+0x63/0x150
> [ 2095.538648] INFO: task trinity-c7:3133 blocked for more than 120 seconds.
> [ 2095.546227]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.552544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.561288] trinity-c7      D ffff88044d393d10 13472  3133   3124
> 0x00000084
> [ 2095.569188]  ffff88044d393d10 ffff880443769fe8 ffff880400000000
> ffff88086ce68000
> [ 2095.577491]  ffff88045899e000 ffff88044d394000 ffff880443769fd0
> ffff88044d393d40
> [ 2095.585796]  ffff880443769fe8 ffff88044376a158 ffff88044d393d28
> ffffffff817cdaaf
> [ 2095.594103] Call Trace:
> [ 2095.596836]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.602379]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.609491]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2095.616490]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2095.623877]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2095.630212]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.636740]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.643076]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2095.649889]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2095.656109]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
> [ 2095.661653]  [<ffffffff812bdf40>] SyS_fsync+0x10/0x20
> [ 2095.667291]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.673417]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.680610] 1 lock held by trinity-c7/3133:
> [ 2095.685281]  #0:  (&xfs_nondir_ilock_class){++++..}, at:
> [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.695947] INFO: task trinity-c8:3135 blocked for more than 120 seconds.
> [ 2095.703530]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.709848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.718590] trinity-c8      D ffff88044d3b3d10 12912  3135   3124
> 0x00000084
> [ 2095.726470]  ffff88044d3b3d10 ffff880443769fe8 ffff880400000000
> ffff88046ca30000
> [ 2095.734775]  ffff88044d3a8000 ffff88044d3b4000 ffff880443769fd0
> ffff88044d3b3d40
> [ 2095.743083]  ffff880443769fe8 ffff88044376a158 ffff88044d3b3d28
> ffffffff817cdaaf
> [ 2095.751387] Call Trace:
> [ 2095.754119]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.759662]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.766772]  [<ffffffffa0322cca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2095.773763]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2095.781148]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2095.787482]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.794013]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.800347]  [<ffffffffa0322cca>] xfs_file_fsync+0xea/0x2e0 [xfs]
> [ 2095.807155]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2095.813377]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
> [ 2095.818921]  [<ffffffff812bdf63>] SyS_fdatasync+0x13/0x20
> [ 2095.824949]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.831074]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.838261] 1 lock held by trinity-c8/3135:
> [ 2095.842930]  #0:  (&xfs_nondir_ilock_class){++++..}, at:
> [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.853588] INFO: task trinity-c9:3136 blocked for more than 120 seconds.
> [ 2095.861167]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
> [ 2095.867485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 2095.876228] trinity-c9      D ffff88045b3679e0 13328  3136   3124
> 0x00000084
> [ 2095.884111]  ffff88045b3679e0 ffff880443769fe8 ffff880400000000
> ffff88086ce56000
> [ 2095.892417]  ffff88044d3aa000 ffff88045b368000 ffff880443769fd0
> ffff88045b367a10
> [ 2095.900721]  ffff880443769fe8 ffff88044376a1e8 ffff88045b3679f8
> ffffffff817cdaaf
> [ 2095.909024] Call Trace:
> [ 2095.911761]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
> [ 2095.917305]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.924414]  [<ffffffffa0333790>] ? xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [ 2095.932383]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2095.939768]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2095.946104]  [<ffffffffa03335fa>] ? xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.952632]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.958968]  [<ffffffffa0333790>] xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [ 2095.966752]  [<ffffffffa03128c6>] __xfs_get_blocks+0x96/0x9d0 [xfs]
> [ 2095.973753]  [<ffffffff8126462e>] ?
> mem_cgroup_event_ratelimit.isra.39+0x3e/0xb0
> [ 2095.982012]  [<ffffffff8126e8e5>] ? mem_cgroup_commit_charge+0x95/0x110
> [ 2095.989413]  [<ffffffffa0313214>] xfs_get_blocks+0x14/0x20 [xfs]
> [ 2095.996122]  [<ffffffff812cca44>] do_mpage_readpage+0x474/0x800
> [ 2096.002745]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
> [ 2096.010037]  [<ffffffff81402fd7>] ? debug_smp_processor_id+0x17/0x20
> [ 2096.017136]  [<ffffffff811f3565>] ? __lru_cache_add+0x75/0xb0
> [ 2096.023551]  [<ffffffff811f45fe>] ? lru_cache_add+0xe/0x10
> [ 2096.029678]  [<ffffffff812ccf0d>] mpage_readpages+0x13d/0x1b0
> [ 2096.036109]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
> [ 2096.043420]  [<ffffffffa0313200>] ? __xfs_get_blocks+0x9d0/0x9d0 [xfs]
> [ 2096.050724]  [<ffffffffa0311f14>] xfs_vm_readpages+0x54/0x170 [xfs]
> [ 2096.057724]  [<ffffffff811f1a1d>] __do_page_cache_readahead+0x2ad/0x370
> [ 2096.065113]  [<ffffffff811f18ec>] ? __do_page_cache_readahead+0x17c/0x370
> [ 2096.072693]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
> [ 2096.079787]  [<ffffffff811f2014>] force_page_cache_readahead+0x94/0xf0
> [ 2096.087077]  [<ffffffff811f2168>] SyS_readahead+0xa8/0xc0
> [ 2096.093106]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2096.099234]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2096.106427] 1 lock held by trinity-c9/3136:
> [ 2096.111097]  #0:  (&xfs_nondir_ilock_class){++++..}, at:
> [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 7, 2016, 6:56 p.m. UTC | #12
----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "Jan Kara" <jack@suse.cz>, "Miklos Szeredi" <miklos@szeredi.hu>, "tj" <tj@kernel.org>, "Al Viro"
> <viro@ZenIV.linux.org.uk>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Dave Chinner" <david@fromorbit.com>
> Cc: "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> linux-fsdevel@vger.kernel.org, "Dave Jones" <davej@codemonkey.org.uk>
> Sent: Friday, October 7, 2016 11:27:55 AM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> 
> 
> > Hmm, this round of trinity triggered a different hang.
> This hang is reproducible so far with the command below on a overlayfs/xfs,
Another data point is that this hang can also be reproduced using device-mapper thinp
as the docker backend.
    CAI Qian

[12047.714409] INFO: task trinity-c0:3716 blocked for more than 120 seconds.
[12047.722033]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12047.728354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12047.737107] trinity-c0      D ffff8804507dbd10 13552  3716   3713 0x00000084
[12047.744997]  ffff8804507dbd10 ffff8804240e9368 ffff880400000000 ffffffff81c0d540
[12047.753300]  ffff88044c430000 ffff8804507dc000 ffff8804240e9350 ffff8804507dbd40
[12047.761598]  ffff8804240e9368 ffff8804240e94d8 ffff8804507dbd28 ffffffff817cdaaf
[12047.769898] Call Trace:
[12047.772631]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12047.778174]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[12047.785303]  [<ffffffffa028ccca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
[12047.792309]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[12047.799695]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[12047.806029]  [<ffffffffa029d5fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[12047.812554]  [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12047.818887]  [<ffffffffa028ccca>] xfs_file_fsync+0xea/0x2e0 [xfs]
[12047.825693]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
[12047.831915]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
[12047.837455]  [<ffffffff812bdf63>] SyS_fdatasync+0x13/0x20
[12047.843485]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12047.849609]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12047.856801] 1 lock held by trinity-c0/3716:
[12047.861470]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12047.872125] INFO: task trinity-c1:3717 blocked for more than 120 seconds.
[12047.879703]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12047.886011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12047.894749] trinity-c1      D ffff8804507ffd10 13568  3717   3713 0x00000084
[12047.902645]  ffff8804507ffd10 ffff8804240e9368 ffff880400000000 ffff88046c9da000
[12047.910941]  ffff88044c434000 ffff880450800000 ffff8804240e9350 ffff8804507ffd40
[12047.919240]  ffff8804240e9368 ffff8804240e94d8 ffff8804507ffd28 ffffffff817cdaaf
[12047.927542] Call Trace:
[12047.930284]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12047.935826]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[12047.942933]  [<ffffffffa028ccca>] ? xfs_file_fsync+0xea/0x2e0 [xfs]
[12047.949930]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[12047.957315]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[12047.963647]  [<ffffffffa029d5fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[12047.970171]  [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12047.976506]  [<ffffffffa028ccca>] xfs_file_fsync+0xea/0x2e0 [xfs]
[12047.983310]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
[12047.989529]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
[12047.995070]  [<ffffffff812bdf63>] SyS_fdatasync+0x13/0x20
[12048.001096]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.007217]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.014407] 1 lock held by trinity-c1/3717:
[12048.019085]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12048.029742] INFO: task trinity-c2:3718 blocked for more than 120 seconds.
[12048.037310]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.043626] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.052365] trinity-c2      D ffff8804586c7df8 13504  3718   3713 0x00000084
[12048.060261]  ffff8804586c7df8 0000000000000006 0000000000000000 ffff88046c9dc000
[12048.068565]  ffff88044c436000 ffff8804586c8000 ffff88044ec7e6f8 ffff88044c436000
[12048.076862]  0000000000000246 00000000ffffffff ffff8804586c7e10 ffffffff817cdaaf
[12048.085163] Call Trace:
[12048.087893]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.093434]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[12048.100627]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[12048.107237]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[12048.113262]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.119094]  [<ffffffff81297f53>] SyS_getdents+0x83/0x140
[12048.125120]  [<ffffffff81297cd0>] ? fillonedir+0x100/0x100
[12048.131243]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.137357]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.144546] 1 lock held by trinity-c2/3718:
[12048.149214]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.158495] INFO: task trinity-c3:3719 blocked for more than 120 seconds.
[12048.166071]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.172388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.181120] trinity-c3      D ffff880450707c60 13552  3719   3713 0x00000084
[12048.189013]  ffff880450707c60 ffffffff00000000 ffff880400000000 ffff88046ca10000
[12048.197313]  ffff88044c432000 ffff880450708000 ffff8804240e9658 ffff8804240e9640
[12048.205612]  ffffffff00000000 ffff88044c432000 ffff880450707c78 ffffffff817cdaaf
[12048.213912] Call Trace:
[12048.216643]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.222183]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
[12048.229374]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
[12048.236662]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
[12048.244144]  [<ffffffff817d1bff>] down_write+0x5f/0x80
[12048.249881]  [<ffffffff812ad021>] ? vfs_removexattr+0x61/0x120
[12048.256391]  [<ffffffff812ad021>] vfs_removexattr+0x61/0x120
[12048.262709]  [<ffffffff812ad135>] removexattr+0x55/0x80
[12048.268533]  [<ffffffff81402ff3>] ? __this_cpu_preempt_check+0x13/0x20
[12048.275811]  [<ffffffff810f8eae>] ? update_fast_ctr+0x4e/0x70
[12048.282225]  [<ffffffff810f8f57>] ? percpu_down_read+0x57/0x90
[12048.288728]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[12048.295230]  [<ffffffff810cc367>] ? preempt_count_add+0x47/0xc0
[12048.301829]  [<ffffffff812a665f>] ? mnt_clone_write+0x3f/0x70
[12048.308242]  [<ffffffff812a8588>] ? __mnt_want_write_file+0x18/0x30
[12048.315238]  [<ffffffff812a85d0>] ? mnt_want_write_file+0x30/0x60
[12048.322039]  [<ffffffff812ae303>] SyS_fremovexattr+0x83/0xb0
[12048.328356]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.334478]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.341679] 2 locks held by trinity-c3/3719:
[12048.346454]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[12048.356042]  #1:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff812ad021>] vfs_removexattr+0x61/0x120
[12048.367079] INFO: task trinity-c4:3720 blocked for more than 120 seconds.
[12048.374655]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.380972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.389712] trinity-c4      D ffff88045072be08 13536  3720   3713 0x00000084
[12048.397606]  ffff88045072be08 0000000000000006 0000000000000000 ffff88046c9fe000
[12048.405902]  ffff880450720000 ffff88045072c000 ffff88044ec7e6f8 ffff880450720000
[12048.414205]  0000000000000246 00000000ffffffff ffff88045072be20 ffffffff817cdaaf
[12048.422505] Call Trace:
[12048.425235]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.430767]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[12048.437957]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[12048.444565]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[12048.450591]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[12048.457675]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.463508]  [<ffffffff81298091>] SyS_getdents64+0x81/0x130
[12048.469720]  [<ffffffff81297a80>] ? iterate_dir+0x190/0x190
[12048.475939]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.482063]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.489243] 1 lock held by trinity-c4/3720:
[12048.493913]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.503182] INFO: task trinity-c5:3721 blocked for more than 120 seconds.
[12048.510757]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.517071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.525812] trinity-c5      D ffff8804510a7e08 13552  3721   3713 0x00000084
[12048.533706]  ffff8804510a7e08 0000000000000006 0000000000000000 ffff88046c9fa000
[12048.542007]  ffff880450722000 ffff8804510a8000 ffff88044ec7e6f8 ffff880450722000
[12048.550310]  0000000000000246 00000000ffffffff ffff8804510a7e20 ffffffff817cdaaf
[12048.558610] Call Trace:
[12048.561339]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.566879]  [<ffffffff817cdf18>] schedule_preempt_disabled+0x18/0x30
[12048.574070]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
[12048.580677]  [<ffffffff812a5313>] ? __fdget_pos+0x43/0x50
[12048.586703]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[12048.593796]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.599629]  [<ffffffff81298091>] SyS_getdents64+0x81/0x130
[12048.605849]  [<ffffffff81297a80>] ? iterate_dir+0x190/0x190
[12048.612069]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.618191]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.625382] 1 lock held by trinity-c5/3721:
[12048.630049]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50
[12048.639329] INFO: task trinity-c6:3722 blocked for more than 120 seconds.
[12048.646903]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.653219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.661958] trinity-c6      D ffff88044f0ebc50 12224  3722   3713 0x00000084
[12048.669849]  ffff88044f0ebc50 ffff8804240e9368 ffff880400000000 ffff88046c9fc000
[12048.678149]  ffff880450724000 ffff88044f0ec000 ffff8804240e9350 ffff88044f0ebc80
[12048.686448]  ffff8804240e9368 ffff8804240e92c0 ffff88044f0ebc68 ffffffff817cdaaf
[12048.694750] Call Trace:
[12048.697478]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.703018]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[12048.710126]  [<ffffffffa029d7d4>] ? xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[12048.718095]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[12048.725479]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[12048.731800]  [<ffffffffa029d5fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[12048.738337]  [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12048.744669]  [<ffffffffa029d7d4>] xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[12048.752457]  [<ffffffffa0280801>] xfs_attr_list_int+0x71/0x690 [xfs]
[12048.759555]  [<ffffffff810cba89>] ? __might_sleep+0x49/0x80
[12048.765792]  [<ffffffffa02abf2a>] xfs_vn_listxattr+0x7a/0xb0 [xfs]
[12048.772707]  [<ffffffffa02abcc0>] ? __xfs_xattr_put_listent+0xa0/0xa0 [xfs]
[12048.780480]  [<ffffffff812ad582>] vfs_listxattr+0x42/0x70
[12048.786517]  [<ffffffff812ad68e>] listxattr+0xde/0xf0
[12048.792156]  [<ffffffff812ae1f6>] SyS_flistxattr+0x56/0xa0
[12048.798271]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.804404]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.811595] 1 lock held by trinity-c6/3722:
[12048.816263]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12048.826935] INFO: task trinity-c7:3723 blocked for more than 120 seconds.
[12048.834516]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12048.840832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12048.849572] trinity-c7      D ffff88044fc23c50 13552  3723   3713 0x00000084
[12048.857469]  ffff88044fc23c50 ffff8804240e9368 ffff880400000000 ffff88046c9f8000
[12048.865768]  ffff880450726000 ffff88044fc24000 ffff8804240e9350 ffff88044fc23c80
[12048.874067]  ffff8804240e9368 ffff8804240e92c0 ffff88044fc23c68 ffffffff817cdaaf
[12048.882370] Call Trace:
[12048.885100]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12048.890634]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
[12048.897741]  [<ffffffffa029d7d4>] ? xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[12048.905707]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
[12048.913081]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
[12048.919412]  [<ffffffffa029d5fa>] ? xfs_ilock+0xfa/0x260 [xfs]
[12048.925937]  [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12048.932267]  [<ffffffffa029d7d4>] xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
[12048.940053]  [<ffffffffa0280801>] xfs_attr_list_int+0x71/0x690 [xfs]
[12048.947146]  [<ffffffff810cba89>] ? __might_sleep+0x49/0x80
[12048.953374]  [<ffffffffa02abf2a>] xfs_vn_listxattr+0x7a/0xb0 [xfs]
[12048.960288]  [<ffffffffa02abcc0>] ? __xfs_xattr_put_listent+0xa0/0xa0 [xfs]
[12048.968060]  [<ffffffff812ad582>] vfs_listxattr+0x42/0x70
[12048.974088]  [<ffffffff812ad602>] listxattr+0x52/0xf0
[12048.979726]  [<ffffffff812ae1f6>] SyS_flistxattr+0x56/0xa0
[12048.985849]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12048.991973]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12048.999162] 1 lock held by trinity-c7/3723:
[12049.003831]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa029d5fa>] xfs_ilock+0xfa/0x260 [xfs]
[12049.014481] INFO: task trinity-c8:3724 blocked for more than 120 seconds.
[12049.022072]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12049.028389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12049.037130] trinity-c8      D ffff88044fc3fc60 13504  3724   3713 0x00000084
[12049.045023]  ffff88044fc3fc60 ffffffff00000000 ffff880400000000 ffff88046ca14000
[12049.053324]  ffff88044e540000 ffff88044fc40000 ffff8804240e9368 ffff8804240e9350
[12049.061623]  ffffffff00000000 ffff88044e540000 ffff88044fc3fc78 ffffffff817cdaaf
[12049.069924] Call Trace:
[12049.072654]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12049.078208]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
[12049.085408]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
[12049.092734]  [<ffffffffa028d7cc>] ? xfs_update_prealloc_flags+0x6c/0x100 [xfs]
[12049.100798]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
[12049.108290]  [<ffffffff810f8c15>] down_write_nested+0x65/0x80
[12049.114742]  [<ffffffffa029d68e>] ? xfs_ilock+0x18e/0x260 [xfs]
[12049.121377]  [<ffffffffa029d68e>] xfs_ilock+0x18e/0x260 [xfs]
[12049.127819]  [<ffffffffa028d7cc>] xfs_update_prealloc_flags+0x6c/0x100 [xfs]
[12049.135714]  [<ffffffffa028da8e>] xfs_file_fallocate+0x22e/0x360 [xfs]
[12049.143004]  [<ffffffff810f8eae>] ? update_fast_ctr+0x4e/0x70
[12049.149435]  [<ffffffff810f8f57>] ? percpu_down_read+0x57/0x90
[12049.155958]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[12049.162492]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
[12049.169016]  [<ffffffff8127e000>] vfs_fallocate+0x140/0x230
[12049.175249]  [<ffffffff8127eee4>] SyS_fallocate+0x44/0x70
[12049.181288]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12049.187423]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12049.194667] 5 locks held by trinity-c8/3724:
[12049.199429]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[12049.209024]  #1:  (&(&ip->i_iolock)->mr_lock){++++++}, at: [<ffffffffa029d654>] xfs_ilock+0x154/0x260 [xfs]
[12049.219990]  #2:  (&(&ip->i_mmaplock)->mr_lock){+++++.}, at: [<ffffffffa029d674>] xfs_ilock+0x174/0x260 [xfs]
[12049.231128]  #3:  (sb_internal){.+.+.+}, at: [<ffffffff81284b8b>] __sb_start_write+0x7b/0xf0
[12049.240620]  #4:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa029d68e>] xfs_ilock+0x18e/0x260 [xfs]
[12049.251383] INFO: task trinity-c9:3725 blocked for more than 120 seconds.
[12049.258959]       Not tainted 4.8.0-rc8-splice-fixw-proc+ #4
[12049.265287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12049.274027] trinity-c9      D ffff88044f043d30 13552  3725   3713 0x00000084
[12049.281922]  ffff88044f043d30 ffffffff00000000 ffff880400000000 ffff88046ca14000
[12049.290238]  ffff88044e542000 ffff88044f044000 ffff8804240e9658 ffff8804240e9640
[12049.298539]  ffffffff00000000 ffff88044e542000 ffff88044f043d48 ffffffff817cdaaf
[12049.306840] Call Trace:
[12049.309569]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[12049.315122]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
[12049.322327]  [<ffffffff817d25ac>] ? rwsem_down_write_failed+0x6c/0x4b0
[12049.329625]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
[12049.337118]  [<ffffffff817d1bff>] down_write+0x5f/0x80
[12049.342864]  [<ffffffff8127e413>] ? chmod_common+0x63/0x150
[12049.349096]  [<ffffffff8127e413>] chmod_common+0x63/0x150
[12049.355131]  [<ffffffff8117729f>] ? __audit_syscall_entry+0xaf/0x100
[12049.362236]  [<ffffffff810035cc>] ? syscall_trace_enter+0x1dc/0x390
[12049.369243]  [<ffffffff8127f5f2>] SyS_fchmod+0x52/0x80
[12049.374988]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
[12049.381124]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
[12049.388324] 2 locks held by trinity-c9/3725:
[12049.393100]  #0:  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
[12049.402705]  #1:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8127e413>] chmod_common+0x63/0x150
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 9, 2016, 9:51 p.m. UTC | #13
On Fri, Oct 07, 2016 at 10:43:18AM -0400, CAI Qian wrote:
> Hmm, this round of trinity triggered a different hang.
> 
> [ 2094.487964]  [<ffffffff813e27b7>] call_rwsem_down_write_failed+0x17/0x30
> [ 2094.495450]  [<ffffffff817d1bff>] down_write+0x5f/0x80
> [ 2094.508284]  [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0
> [ 2094.553784] 2 locks held by trinity-c0/3126:
> [ 2094.558552]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 2094.568240]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at: [<ffffffff8127e301>] chown_common.isra.12+0x131/0x1e0

Waiting on i_mutex.

> [ 2094.643597]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2094.665119]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2094.691133]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2094.721844] 1 lock held by trinity-c1/3127:
> [ 2094.726515]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]

Waiting on i_ilock.

> [ 2094.808078]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2094.820715]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
> [ 2094.826544]  [<ffffffff81297f53>] SyS_getdents+0x83/0x140
> [ 2094.856682]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50

concurrent readdir on the same directory fd, blocked on fd.

> [ 2094.936885]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2094.956620]  [<ffffffff812a5313>] __fdget_pos+0x43/0x50
> [ 2094.962454]  [<ffffffff81298091>] SyS_getdents64+0x81/0x130
> [ 2094.988204] 1 lock held by trinity-c3/3129:
> [ 2094.992872]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50

Same.

> [ 2095.073118]  [<ffffffff817cf4df>] mutex_lock_nested+0x19f/0x450
> [ 2095.091589]  [<ffffffff812811dd>] SyS_lseek+0x1d/0xb0
> [ 2095.097229]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.110547] 1 lock held by trinity-c4/3130:
> [ 2095.115216]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812a5313>] __fdget_pos+0x43/0x50

Concurrent lseek on directory fd, blocked on fd.


> [ 2095.188230]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.223558]  [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]
> [ 2095.229894]  [<ffffffffa03337d4>] xfs_ilock_attr_map_shared+0x34/0x40 [xfs]
> [ 2095.237682]  [<ffffffffa02ccfaf>] xfs_attr_get+0xdf/0x1b0 [xfs]
> [ 2095.244312]  [<ffffffffa0341bfc>] xfs_xattr_get+0x4c/0x70 [xfs]
> [ 2095.250924]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
> [ 2095.257244]  [<ffffffff812acf9b>] vfs_getxattr+0x8b/0xb0
> [ 2095.263177]  [<ffffffffa0435bd6>] ovl_xattr_get+0x46/0x60 [overlay]
> [ 2095.270176]  [<ffffffffa04331aa>] ovl_other_xattr_get+0x1a/0x20 [overlay]
> [ 2095.277756]  [<ffffffff812ad269>] generic_getxattr+0x59/0x70
> [ 2095.284079]  [<ffffffff81345e9e>] cap_inode_need_killpriv+0x2e/0x40
> [ 2095.291078]  [<ffffffff81349a33>] security_inode_need_killpriv+0x33/0x50
> [ 2095.298560]  [<ffffffff812a2fb0>] dentry_needs_remove_privs+0x30/0x50
> [ 2095.305743]  [<ffffffff8127ea21>] do_truncate+0x51/0xc0
> [ 2095.311581]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
> [ 2095.318094]  [<ffffffff81284be1>] ? __sb_start_write+0xd1/0xf0
> [ 2095.324609]  [<ffffffff8127edde>] do_sys_ftruncate.constprop.15+0xfe/0x160
> [ 2095.332286]  [<ffffffff8127ee7e>] SyS_ftruncate+0xe/0x10
> [ 2095.338225]  [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> [ 2095.344339]  [<ffffffff817d4a3f>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 2095.351531] 2 locks held by trinity-c5/3131:
> [ 2095.356297]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 2095.365983]  #1:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]

truncate on overlay, removing xattrs from XFS file, blocked on
i_ilock.

> [ 2095.440372]  [<ffffffff817d2782>] rwsem_down_write_failed+0x242/0x4b0
> [ 2095.474300]  [<ffffffff8127e413>] chmod_common+0x63/0x150
> [ 2095.513452] 2 locks held by trinity-c6/3132:
> [ 2095.518217]  #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> [ 2095.527895]  #1:  (&sb->s_type->i_mutex_key#17){++++++}, at: [<ffffffff8127e413>] chmod_common+0x63/0x150

chmod, blocked on i_mutex.

> [ 2095.602379]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.616490]  [<ffffffff813e2788>] call_rwsem_down_read_failed+0x18/0x30
> [ 2095.623877]  [<ffffffff810f8b0b>] down_read_nested+0x5b/0x80
> [ 2095.649889]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2095.680610] 1 lock held by trinity-c7/3133:
> [ 2095.685281]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]

fsync on file, blocked on i_ilock.

> [ 2095.759662]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.807155]  [<ffffffff812bdbbd>] vfs_fsync_range+0x3d/0xb0
> [ 2095.813377]  [<ffffffff812bdc8d>] do_fsync+0x3d/0x70
> [ 2095.818921]  [<ffffffff812bdf63>] SyS_fdatasync+0x13/0x20
> [ 2095.838261] 1 lock held by trinity-c8/3135:
> [ 2095.842930]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]

ditto.

> [ 2095.917305]  [<ffffffff817d24b7>] rwsem_down_read_failed+0x107/0x190
> [ 2095.958968]  [<ffffffffa0333790>] xfs_ilock_data_map_shared+0x30/0x40 [xfs]
> [ 2095.966752]  [<ffffffffa03128c6>] __xfs_get_blocks+0x96/0x9d0 [xfs]
> [ 2095.989413]  [<ffffffffa0313214>] xfs_get_blocks+0x14/0x20 [xfs]
> [ 2095.996122]  [<ffffffff812cca44>] do_mpage_readpage+0x474/0x800
> [ 2096.029678]  [<ffffffff812ccf0d>] mpage_readpages+0x13d/0x1b0
> [ 2096.050724]  [<ffffffffa0311f14>] xfs_vm_readpages+0x54/0x170 [xfs]
> [ 2096.057724]  [<ffffffff811f1a1d>] __do_page_cache_readahead+0x2ad/0x370
> [ 2096.079787]  [<ffffffff811f2014>] force_page_cache_readahead+0x94/0xf0
> [ 2096.087077]  [<ffffffff811f2168>] SyS_readahead+0xa8/0xc0
> [ 2096.106427] 1 lock held by trinity-c9/3136:
> [ 2096.111097]  #0:  (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa03335fa>] xfs_ilock+0xfa/0x260 [xfs]

readhead blocking in i_ilock before reading in extents.

Nothing here indicates a deadlock. Everything is waiting for locks,
but nothing is holding locks in a way that indicates that progress
is not being made. This sort of thing can happen when slow storage
is massively overloaded - sysrq-w is really the only way to get a
better picutre of what is happening here, but so far there's no
concrete evidence of a hang from this output.

Cheers,

Dave.
Dave Chinner Oct. 9, 2016, 9:54 p.m. UTC | #14
On Fri, Oct 07, 2016 at 02:56:22PM -0400, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > From: "CAI Qian" <caiqian@redhat.com>
> > To: "Jan Kara" <jack@suse.cz>, "Miklos Szeredi" <miklos@szeredi.hu>, "tj" <tj@kernel.org>, "Al Viro"
> > <viro@ZenIV.linux.org.uk>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Dave Chinner" <david@fromorbit.com>
> > Cc: "linux-xfs" <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> > linux-fsdevel@vger.kernel.org, "Dave Jones" <davej@codemonkey.org.uk>
> > Sent: Friday, October 7, 2016 11:27:55 AM
> > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> > 
> > 
> > 
> > > Hmm, this round of trinity triggered a different hang.
> > This hang is reproducible so far with the command below on a overlayfs/xfs,
> Another data point is that this hang can also be reproduced using device-mapper thinp
> as the docker backend.

Again, no evidence that the system is actually hung. Waiting on
locks, yes, but nothing to indicate there is a deadlock in those
waiters.

Cheers,

Dave.
CAI Qian Oct. 10, 2016, 2:10 p.m. UTC | #15
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "Jan Kara" <jack@suse.cz>, "Miklos Szeredi" <miklos@szeredi.hu>, "tj" <tj@kernel.org>, "Al Viro"
> <viro@ZenIV.linux.org.uk>, "Linus Torvalds" <torvalds@linux-foundation.org>, "linux-xfs"
> <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> linux-fsdevel@vger.kernel.org, "Dave Jones" <davej@codemonkey.org.uk>
> Sent: Sunday, October 9, 2016 5:54:55 PM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> Again, no evidence that the system is actually hung. Waiting on
> locks, yes, but nothing to indicate there is a deadlock in those
> waiters.
Here you are,

http://people.redhat.com/qcai/tmp/dmesg

    CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 10, 2016, 8:14 p.m. UTC | #16
> Here you are,
> 
> http://people.redhat.com/qcai/tmp/dmesg
Also, this turned out to be a regression and bisecting so far pointed out this commit,

commit 5d50ac70fe98518dbf620bfba8184254663125eb
Merge: 31c1feb 4e14e49
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Nov 11 20:18:48 2015 -0800

    Merge tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/g
    
    Pull xfs updates from Dave Chinner:
     "There is nothing really major here - the only significant addition is
      the per-mount operation statistics infrastructure.  Otherwises there's
      various ACL, xattr, DAX, AIO and logging fixes, and a smattering of
      small cleanups and fixes elsewhere.
    
      Summary:
    
       - per-mount operational statistics in sysfs
       - fixes for concurrent aio append write submission
       - various logging fixes
       - detection of zeroed logs and invalid log sequence numbers on v5 filesys
       - memory allocation failure message improvements
       - a bunch of xattr/ACL fixes
       - fdatasync optimisation
       - miscellaneous other fixes and cleanups"

    * tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/d
      xfs: give all workqueues rescuer threads
      xfs: fix log recovery op header validation assert
      xfs: Fix error path in xfs_get_acl
      xfs: optimise away log forces on timestamp updates for fdatasync
      xfs: don't leak uuid table on rmmod
      xfs: invalidate cached acl if set via ioctl
      xfs: Plug memory leak in xfs_attrmulti_attr_set
      xfs: Validate the length of on-disk ACLs
      xfs: invalidate cached acl if set directly via xattr
      xfs: xfs_filemap_pmd_fault treats read faults as write faults
      xfs: add ->pfn_mkwrite support for DAX
      xfs: DAX does not use IO completion callbacks
      xfs: Don't use unwritten extents for DAX
      xfs: introduce BMAPI_ZERO for allocating zeroed extents
      xfs: fix inode size update overflow in xfs_map_direct()
      xfs: clear PF_NOFREEZE for xfsaild kthread
      xfs: fix an error code in xfs_fs_fill_super()
      xfs: stats are no longer dependent on CONFIG_PROC_FS
      xfs: simplify /proc teardown & error handling
      xfs: per-filesystem stats counter implementation
      ...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 10, 2016, 9:57 p.m. UTC | #17
On Mon, Oct 10, 2016 at 10:10:29AM -0400, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: "Jan Kara" <jack@suse.cz>, "Miklos Szeredi" <miklos@szeredi.hu>, "tj" <tj@kernel.org>, "Al Viro"
> > <viro@ZenIV.linux.org.uk>, "Linus Torvalds" <torvalds@linux-foundation.org>, "linux-xfs"
> > <linux-xfs@vger.kernel.org>, "Jens Axboe" <axboe@kernel.dk>, "Nick Piggin" <npiggin@gmail.com>,
> > linux-fsdevel@vger.kernel.org, "Dave Jones" <davej@codemonkey.org.uk>
> > Sent: Sunday, October 9, 2016 5:54:55 PM
> > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> > 
> > Again, no evidence that the system is actually hung. Waiting on
> > locks, yes, but nothing to indicate there is a deadlock in those
> > waiters.
> Here you are,
> 
> http://people.redhat.com/qcai/tmp/dmesg

It's a page lock order bug in the XFS seek hole/data implementation.

-Dave.
CAI Qian Oct. 12, 2016, 7:50 p.m. UTC | #18
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> Sent: Monday, October 10, 2016 5:57:14 PM
> 
> > http://people.redhat.com/qcai/tmp/dmesg
> 
> It's a page lock order bug in the XFS seek hole/data implementation.
So reverted this commit against the latest mainline allows trinity run
hours. Otherwise, it always hang at fdatasync() within 30 minutes.

fc0561cefc04e7803c0f6501ca4f310a502f65b8
xfs: optimise away log forces on timestamp updates for fdatasync

PS: tested against the vfs tree's #work.splice_read with this commit
reverted is now hanging at sync() instead which won't be  reproduced
against the mainline so far.
http://people.redhat.com/qcai/tmp/dmesg-sync

   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 12, 2016, 8:59 p.m. UTC | #19
On Wed, Oct 12, 2016 at 03:50:36PM -0400, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > Sent: Monday, October 10, 2016 5:57:14 PM
> > 
> > > http://people.redhat.com/qcai/tmp/dmesg
> > 
> > It's a page lock order bug in the XFS seek hole/data implementation.
> So reverted this commit against the latest mainline allows trinity run
> hours. Otherwise, it always hang at fdatasync() within 30 minutes.
> 
> fc0561cefc04e7803c0f6501ca4f310a502f65b8
> xfs: optimise away log forces on timestamp updates for fdatasync

Has nothing at all to do with the hang.

> PS: tested against the vfs tree's #work.splice_read with this commit
> reverted is now hanging at sync() instead which won't be  reproduced
> against the mainline so far.
> http://people.redhat.com/qcai/tmp/dmesg-sync

It is the same page lock vs seek hole/data issue.

Cheers,

Dave.
CAI Qian Oct. 13, 2016, 4:25 p.m. UTC | #20
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> Sent: Wednesday, October 12, 2016 4:59:01 PM
> Subject: Re: [bisected] Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> On Wed, Oct 12, 2016 at 03:50:36PM -0400, CAI Qian wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Dave Chinner" <david@fromorbit.com>
> > > Sent: Monday, October 10, 2016 5:57:14 PM
> > > 
> > > > http://people.redhat.com/qcai/tmp/dmesg
> > > 
> > > It's a page lock order bug in the XFS seek hole/data implementation.
> > So reverted this commit against the latest mainline allows trinity run
> > hours. Otherwise, it always hang at fdatasync() within 30 minutes.
> > 
> > fc0561cefc04e7803c0f6501ca4f310a502f65b8
> > xfs: optimise away log forces on timestamp updates for fdatasync
> 
> Has nothing at all to do with the hang.
> 
> > PS: tested against the vfs tree's #work.splice_read with this commit
> > reverted is now hanging at sync() instead which won't be  reproduced
> > against the mainline so far.
> > http://people.redhat.com/qcai/tmp/dmesg-sync
> 
> It is the same page lock vs seek hole/data issue.
FYI, CVE-2016-8660 was assigned for it.
   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 13, 2016, 8:49 p.m. UTC | #21
On Thu, Oct 13, 2016 at 12:25:30PM -0400, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > Sent: Wednesday, October 12, 2016 4:59:01 PM
> > Subject: Re: [bisected] Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> > 
> > On Wed, Oct 12, 2016 at 03:50:36PM -0400, CAI Qian wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Dave Chinner" <david@fromorbit.com>
> > > > Sent: Monday, October 10, 2016 5:57:14 PM
> > > > 
> > > > > http://people.redhat.com/qcai/tmp/dmesg
> > > > 
> > > > It's a page lock order bug in the XFS seek hole/data implementation.
> > > So reverted this commit against the latest mainline allows trinity run
> > > hours. Otherwise, it always hang at fdatasync() within 30 minutes.
> > > 
> > > fc0561cefc04e7803c0f6501ca4f310a502f65b8
> > > xfs: optimise away log forces on timestamp updates for fdatasync
> > 
> > Has nothing at all to do with the hang.
> > 
> > > PS: tested against the vfs tree's #work.splice_read with this commit
> > > reverted is now hanging at sync() instead which won't be  reproduced
> > > against the mainline so far.
> > > http://people.redhat.com/qcai/tmp/dmesg-sync
> > 
> > It is the same page lock vs seek hole/data issue.
> FYI, CVE-2016-8660 was assigned for it.

Why? This isn't a security issue - CVEs cost time and effort for
everyone to track and follow and raising them for issues like this
does not help anyone fix the actual problem.  It doesn't help us
track it, analyse it, communicate with the bug reporter, test it or
get the fix committed.  It's meaningless to the developers fixing
the code, it's meaningless to users, and it's meaningless to most
distros that are supporting XFS because the distro maintainers don't
watch the CVE lists for XFS bugs they need to backport and fix.

All this does is artificially inflate the supposed importance of the
bug. CVEs are for security or severe issues. This is neither serious
or a security issue - please have the common courtesy to ask the
people with the knowledge to make such a determination (i.e. the
maintainers) before you waste the time of a /large number/ of people
by raising a useless CVE...

Yes, you found a bug. No, it's not a security bug. No, you should
not abusing of the CVE process to apply pressure to get it fixed.
Please don't do this again.

Cheers,

Dave.
CAI Qian Oct. 13, 2016, 8:56 p.m. UTC | #22
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> Sent: Thursday, October 13, 2016 4:49:17 PM
> Subject: Re: [bisected] Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
>
> Why? This isn't a security issue - CVEs cost time and effort for
> everyone to track and follow and raising them for issues like this
> does not help anyone fix the actual problem.  It doesn't help us
> track it, analyse it, communicate with the bug reporter, test it or
> get the fix committed.  It's meaningless to the developers fixing
> the code, it's meaningless to users, and it's meaningless to most
> distros that are supporting XFS because the distro maintainers don't
> watch the CVE lists for XFS bugs they need to backport and fix.
> 
> All this does is artificially inflate the supposed importance of the
> bug. CVEs are for security or severe issues. This is neither serious
> or a security issue - please have the common courtesy to ask the
> people with the knowledge to make such a determination (i.e. the
> maintainers) before you waste the time of a /large number/ of people
> by raising a useless CVE...
> 
> Yes, you found a bug. No, it's not a security bug. No, you should
> not abusing of the CVE process to apply pressure to get it fixed.
> Please don't do this again.
As far as I can tell, this is a medium-severity security issue that a
non-privileged user can exploit it to cause a system hang/deadlock.
Hence, a local DoS for other users use the system.
   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CAI Qian Oct. 21, 2016, 3:38 p.m. UTC | #23
----- Original Message -----
> From: "Jan Kara" <jack@suse.cz>
> Sent: Friday, October 7, 2016 3:08:38 AM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
> 
> 
> So I believe this may be just a problem in overlayfs lockdep annotation
> (see below). Added Miklos to CC.
> 
> > Wait. There is also a lockep happened before the xfs internal error as
> > well.
> > 
> > [ 5839.452325] ======================================================
> > [ 5839.459221] [ INFO: possible circular locking dependency detected ]
> > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
> > [ 5839.471945] -------------------------------------------------------
> > [ 5839.478937] trinity-c220/69531 is trying to acquire lock:
> > [ 5839.484961]  (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>]
> > seq_read+0x4c/0x3e0
> > [ 5839.492967]
> > but task is already holding lock:
> > [ 5839.499476]  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>]
> > __sb_start_write+0xd1/0xf0
> > [ 5839.508560]
> > which lock already depends on the new lock.
> > 
> > [ 5839.517686]
> > the existing dependency chain (in reverse order) is:
> > [ 5839.526036]
> > -> #3 (sb_writers#8){.+.+.+}:
> > [ 5839.530751]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
> > [ 5839.537368]        [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90
> > [ 5839.544275]        [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
> > [ 5839.551181]        [<ffffffff812a8544>] mnt_want_write+0x24/0x50
> > [ 5839.557892]        [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30
> > [overlay]
> > [ 5839.565577]        [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480
> > [overlay]
> > [ 5839.573259]        [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay]
> > [ 5839.580555]        [<ffffffff812918ea>] vfs_unlink+0xda/0x190
> > [ 5839.586979]        [<ffffffff81293698>] do_unlinkat+0x268/0x2b0
> > [ 5839.593599]        [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30
> > [ 5839.600120]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
> > [ 5839.606836]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
> > [ 5839.614231]
> 
> So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection
> through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in
> I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we
> grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then
> in ovl_want_write() we grab freeze protection again, but this time for the
> upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT
> (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering
> (we maintain locking classes per fs type so that's why I'm showing fs type
> in parenthesis).
> 
> Now this nesting is nasty because once you add locks that are not tracked
> per fs type into the mix, you get cycles. In this case we've got
> seq_file->lock and cred_guard_mutex into the mix - the splice path is
> doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing
> from seq_file into the real filesystem). Exec path further establishes
> cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle:
> 
> sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex
> (overlay) -> sb_writers (FS-A)
> 
> If I analyzed the lockdep trace, this looks like a real (although remote)
> deadlock possibility. Miklos?

So this can still be reproduced in the yesterday's mainline.

[40581.813575] [ INFO: possible circular locking dependency detected ]
[40581.813578] 4.9.0-rc1-lockfix-uncorev2+ #51 Tainted: G        W      
[40581.813581] -------------------------------------------------------
[40581.813582] trinity-c104/39795 is trying to acquire lock:
[40581.813587]  (
[40581.813588] &p->lock
[40581.813589] ){+.+.+.}
[40581.813600] , at: 
[40581.813601] [<ffffffff8191588c>] seq_read+0xec/0x1400
[40581.813603] 
[40581.813603] but task is already holding lock:
[40581.813605]  (
[40581.813607] sb_writers
[40581.813608] #8
[40581.813609] ){.+.+.+}
[40581.813617] , at: 
[40581.813617] [<ffffffff81889c6a>] do_sendfile+0x9ea/0x1270
[40581.813618] 
[40581.813618] which lock already depends on the new lock.
[40581.813618] 
[40581.813620] 
[40581.813620] the existing dependency chain (in reverse order) is:
[40581.813623] 
[40581.813623] -> #3
[40581.813624]  (
[40581.813625] sb_writers
[40581.813626] #8
[40581.813628] ){.+.+.+}
[40581.813628] :
[40581.813636]        
[40581.813636] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710
[40581.813640]        
[40581.813640] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0
[40581.813644]        
[40581.813645] [<ffffffff8189037e>] __sb_start_write+0xae/0x360
[40581.813650]        
[40581.813650] [<ffffffff819066fa>] mnt_want_write+0x4a/0xc0
[40581.813661]        
[40581.813661] [<ffffffffa16cdfbd>] ovl_want_write+0x8d/0xf0 [overlay]
[40581.813668]        
[40581.813668] [<ffffffffa16d4dc7>] ovl_do_remove+0xe7/0x9a0 [overlay]
[40581.813675]        
[40581.813676] [<ffffffffa16d5696>] ovl_rmdir+0x16/0x20 [overlay]
[40581.813680]        
[40581.813680] [<ffffffff818af90f>] vfs_rmdir+0x1bf/0x3e0
[40581.813685]        
[40581.813686] [<ffffffff818c5965>] do_rmdir+0x2c5/0x430
[40581.813689]        
[40581.813690] [<ffffffff818c8242>] SyS_unlinkat+0x22/0x30
[40581.813696]        
[40581.813696] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540
[40581.813704]        
[40581.813704] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a
[40581.813707] 
[40581.813707] -> #2
[40581.813709]  (
[40581.813710] &sb->s_type->i_mutex_key
[40581.813711] #17
[40581.813712] ){++++++}
[40581.813713] :
[40581.813720]        
[40581.813720] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710
[40581.813726]        
[40581.813726] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0
[40581.813736]        
[40581.813736] [<ffffffff82c84261>] down_read+0xa1/0x1c0
[40581.813740]        
[40581.813740] [<ffffffff818ae2db>] lookup_slow+0x17b/0x4f0
[40581.813744]        
[40581.813744] [<ffffffff818bb228>] walk_component+0x728/0x1d10
[40581.813750]        
[40581.813750] [<ffffffff818bcc1e>] link_path_walk+0x40e/0x1690
[40581.813758]        
[40581.813758] [<ffffffff818c0274>] path_openat+0x1c4/0x3870
[40581.813764]        
[40581.813765] [<ffffffff818c6d19>] do_filp_open+0x1a9/0x2e0
[40581.813772]        
[40581.813772] [<ffffffff8189832b>] do_open_execat+0xcb/0x420
[40581.813783]        
[40581.813784] [<ffffffff8189932b>] open_exec+0x2b/0x50
[40581.813793]        
[40581.813793] [<ffffffff819ea78c>] load_elf_binary+0x103c/0x3550
[40581.813807]        
[40581.813807] [<ffffffff8189a852>] search_binary_handler+0x162/0x480
[40581.813814]        
[40581.813815] [<ffffffff818a106a>] do_execveat_common.isra.24+0x138a/0x2570
[40581.813823]        
[40581.813824] [<ffffffff818a2efa>] SyS_execve+0x3a/0x50
[40581.813828]        
[40581.813828] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540
[40581.813833]        
[40581.813834] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a
[40581.813843] 
[40581.813843] -> #1
[40581.813845]  (
[40581.813850] &sig->cred_guard_mutex
[40581.813852] ){+.+.+.}
[40581.813852] :
[40581.813861]        
[40581.813862] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710
[40581.813871]        
[40581.813871] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0
[40581.813885]        
[40581.813886] [<ffffffff82c7d1d3>] mutex_lock_killable_nested+0x103/0xb90
[40581.813895]        
[40581.813896] [<ffffffff81a3f7a6>] do_io_accounting+0x186/0xcf0
[40581.813902]        
[40581.813903] [<ffffffff81a40329>] proc_tgid_io_accounting+0x19/0x20
[40581.813908]        
[40581.813909] [<ffffffff81a41494>] proc_single_show+0x114/0x1d0
[40581.813917]        
[40581.813917] [<ffffffff81915ad4>] seq_read+0x334/0x1400
[40581.813921]        
[40581.813921] [<ffffffff81884da6>] __vfs_read+0x106/0x990
[40581.813927]        
[40581.813927] [<ffffffff81886038>] vfs_read+0x118/0x400
[40581.813931]        
[40581.813931] [<ffffffff8188aebf>] SyS_read+0xdf/0x1d0
[40581.813938]        
[40581.813938] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540
[40581.813945]        
[40581.813946] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a
[40581.813949] 
[40581.813949] -> #0
[40581.813951]  (
[40581.813954] &p->lock
[40581.813955] ){+.+.+.}
[40581.813956] :
[40581.813961]        
[40581.813961] [<ffffffff81337938>] validate_chain.isra.31+0x2b28/0x4c00
[40581.813965]        
[40581.813966] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710
[40581.813972]        
[40581.813972] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0
[40581.813977]        
[40581.813977] [<ffffffff82c7f2f8>] mutex_lock_nested+0x108/0xa50
[40581.813983]        
[40581.813983] [<ffffffff8191588c>] seq_read+0xec/0x1400
[40581.813993]        
[40581.813993] [<ffffffff81a7bdde>] kernfs_fop_read+0x35e/0x640
[40581.813998]        
[40581.813998] [<ffffffff818812ef>] do_loop_readv_writev+0xdf/0x250
[40581.814003]        
[40581.814003] [<ffffffff81886fb5>] do_readv_writev+0x6a5/0xab0
[40581.814007]        
[40581.814007] [<ffffffff81887446>] vfs_readv+0x86/0xe0
[40581.814020]        
[40581.814020] [<ffffffff8194fdac>] default_file_splice_read+0x49c/0xbb0
[40581.814026]        
[40581.814027] [<ffffffff8194eb74>] do_splice_to+0x104/0x1a0
[40581.814033]        
[40581.814033] [<ffffffff8194ee80>] splice_direct_to_actor+0x270/0xa00
[40581.814039]        
[40581.814039] [<ffffffff8194f7a4>] do_splice_direct+0x194/0x300
[40581.814046]        
[40581.814046] [<ffffffff818896e9>] do_sendfile+0x469/0x1270
[40581.814051]        
[40581.814051] [<ffffffff8188bcb0>] SyS_sendfile64+0x140/0x150
[40581.814054]        
[40581.814055] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540
[40581.814059]        
[40581.814060] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a
[40581.814062] 
[40581.814062] other info that might help us debug this:
[40581.814062] 
[40581.814066] Chain exists of:
[40581.814066]   
[40581.814067] &p->lock
[40581.814069]  --> 
[40581.814070] &sb->s_type->i_mutex_key
[40581.814071] #17
[40581.814073]  --> 
[40581.814076] sb_writers
[40581.814079] #8
[40581.814079] 
[40581.814079] 
[40581.814080]  Possible unsafe locking scenario:
[40581.814080] 
[40581.814081]        CPU0                    CPU1
[40581.814083]        ----                    ----
[40581.814085]   lock(
[40581.814088] sb_writers
[40581.814089] #8
[40581.814089] );
[40581.814091]                                lock(
[40581.814093] &sb->s_type->i_mutex_key
[40581.814095] #17
[40581.814095] );
[40581.814097]                                lock(
[40581.814098] sb_writers
[40581.814099] #8
[40581.814099] );
[40581.814101]   lock(
[40581.814103] &p->lock
[40581.814103] );
[40581.814104] 
[40581.814104]  *** DEADLOCK ***
[40581.814104] 
[40581.814106] 1 lock held by trinity-c104/39795:
[40581.814109]  #0: 
[40581.814111]  (
[40581.814112] sb_writers
[40581.814113] #8
[40581.814114] ){.+.+.+}
[40581.814116] , at: 
[40581.814117] [<ffffffff81889c6a>] do_sendfile+0x9ea/0x1270
[40581.814118] 
[40581.814118] stack backtrace:
[40581.814121] CPU: 25 PID: 39795 Comm: trinity-c104 Tainted: G        W       4.9.0-rc1-lockfix-uncorev2+ #51
[40581.814123] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[40581.814131]  ffff880825886da0 ffffffff81d37124 0000000041b58ab3 ffffffff83348dc7
[40581.814138]  ffffffff81d37064 0000000000000001 0000000000000000 ffff8807b4d5d5d8
[40581.814145]  00000000bfc018be ffff880825886d78 0000000000000001 0000000000000000
[40581.814146] Call Trace:
[40581.814155]  [<ffffffff81d37124>] dump_stack+0xc0/0x12c
[40581.814159]  [<ffffffff81d37064>] ? _atomic_dec_and_lock+0xc4/0xc4
[40581.814168]  [<ffffffff81332fa9>] print_circular_bug+0x3c9/0x5e0
[40581.814171]  [<ffffffff81332be0>] ? print_circular_bug_entry+0xd0/0xd0
[40581.814176]  [<ffffffff81337938>] validate_chain.isra.31+0x2b28/0x4c00
[40581.814182]  [<ffffffff81334e10>] ? check_irq_usage+0x300/0x300
[40581.814192]  [<ffffffff81334e10>] ? check_irq_usage+0x300/0x300
[40581.814196]  [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20
[40581.814200]  [<ffffffff81336045>] ? validate_chain.isra.31+0x1235/0x4c00
[40581.814204]  [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700
[40581.814208]  [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310
[40581.814214]  [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700
[40581.814219]  [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310
[40581.814226]  [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710
[40581.814232]  [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0
[40581.814235]  [<ffffffff8191588c>] ? seq_read+0xec/0x1400
[40581.814240]  [<ffffffff8191588c>] ? seq_read+0xec/0x1400
[40581.814243]  [<ffffffff82c7f2f8>] mutex_lock_nested+0x108/0xa50
[40581.814246]  [<ffffffff8191588c>] ? seq_read+0xec/0x1400
[40581.814250]  [<ffffffff8191588c>] ? seq_read+0xec/0x1400
[40581.814256]  [<ffffffff817fedd6>] ? kasan_unpoison_shadow+0x36/0x50
[40581.814259]  [<ffffffff82c7f1f0>] ? mutex_lock_interruptible_nested+0xb40/0xb40
[40581.814264]  [<ffffffff8168ec2c>] ? get_page_from_freelist+0x175c/0x2ed0
[40581.814271]  [<ffffffff8168d4d0>] ? __isolate_free_page+0x7e0/0x7e0
[40581.814275]  [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290
[40581.814278]  [<ffffffff8191588c>] seq_read+0xec/0x1400
[40581.814283]  [<ffffffff813ac01d>] ? rcu_lockdep_current_cpu_online+0x11d/0x1d0
[40581.814290]  [<ffffffff819157a0>] ? seq_hlist_start_percpu+0x4a0/0x4a0
[40581.814295]  [<ffffffff8198ef20>] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0
[40581.814298]  [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20
[40581.814300]  [<ffffffff81a7bdde>] kernfs_fop_read+0x35e/0x640
[40581.814305]  [<ffffffff81b49a55>] ? selinux_file_permission+0x3c5/0x550
[40581.814310]  [<ffffffff81a7ba80>] ? kernfs_fop_open+0xf40/0xf40
[40581.814312]  [<ffffffff818812ef>] do_loop_readv_writev+0xdf/0x250
[40581.814318]  [<ffffffff81886fb5>] do_readv_writev+0x6a5/0xab0
[40581.814324]  [<ffffffff81886910>] ? vfs_write+0x5f0/0x5f0
[40581.814328]  [<ffffffff81d8fbaf>] ? iov_iter_get_pages_alloc+0x53f/0x1990
[40581.814332]  [<ffffffff81d8f670>] ? iov_iter_npages+0xed0/0xed0
[40581.814336]  [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290
[40581.814339]  [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20
[40581.814344]  [<ffffffff8133caa0>] ? trace_hardirqs_on_caller+0x520/0x720
[40581.814347]  [<ffffffff81887446>] vfs_readv+0x86/0xe0
[40581.814352]  [<ffffffff8194fdac>] default_file_splice_read+0x49c/0xbb0
[40581.814361]  [<ffffffff8194f910>] ? do_splice_direct+0x300/0x300
[40581.814363]  [<ffffffff817fef3d>] ? kasan_kmalloc+0xad/0xe0
[40581.814366]  [<ffffffff818a6287>] ? alloc_pipe_info+0x1b7/0x410
[40581.814371]  [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700
[40581.814373]  [<ffffffff8188bcb0>] ? SyS_sendfile64+0x140/0x150
[40581.814377]  [<ffffffff8100924d>] ? do_syscall_64+0x19d/0x540
[40581.814380]  [<ffffffff82c8af24>] ? entry_SYSCALL64_slow_path+0x25/0x25
[40581.814382]  [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310
[40581.814386]  [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290
[40581.814390]  [<ffffffff8133caa0>] ? trace_hardirqs_on_caller+0x520/0x720
[40581.814395]  [<ffffffff8198ef20>] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0
[40581.814398]  [<ffffffff81b49a55>] ? selinux_file_permission+0x3c5/0x550
[40581.814404]  [<ffffffff81b26e96>] ? security_file_permission+0x176/0x220
[40581.814408]  [<ffffffff81885c78>] ? rw_verify_area+0xd8/0x380
[40581.814411]  [<ffffffff8194eb74>] do_splice_to+0x104/0x1a0
[40581.814415]  [<ffffffff818a63b7>] ? alloc_pipe_info+0x2e7/0x410
[40581.814419]  [<ffffffff8194ee80>] splice_direct_to_actor+0x270/0xa00
[40581.814424]  [<ffffffff8194c5e0>] ? wakeup_pipe_readers+0x90/0x90
[40581.814429]  [<ffffffff8194ec10>] ? do_splice_to+0x1a0/0x1a0
[40581.814432]  [<ffffffff81885c78>] ? rw_verify_area+0xd8/0x380
[40581.814438]  [<ffffffff8194f7a4>] do_splice_direct+0x194/0x300
[40581.814443]  [<ffffffff8194f610>] ? splice_direct_to_actor+0xa00/0xa00
[40581.814450]  [<ffffffff81278cee>] ? preempt_count_sub+0x5e/0xe0
[40581.814452]  [<ffffffff81890415>] ? __sb_start_write+0x145/0x360
[40581.814457]  [<ffffffff818896e9>] do_sendfile+0x469/0x1270
[40581.814461]  [<ffffffff81889280>] ? do_compat_pwritev64.isra.16+0xd0/0xd0
[40581.814466]  [<ffffffff814cb287>] ? __audit_syscall_exit+0x637/0x960
[40581.814469]  [<ffffffff81006afb>] ? syscall_trace_enter+0x89b/0x1930
[40581.814473]  [<ffffffff817f7993>] ? kfree+0x3f3/0x620
[40581.814477]  [<ffffffff8188bcb0>] SyS_sendfile64+0x140/0x150
[40581.814479]  [<ffffffff8188bb70>] ? SyS_sendfile+0x140/0x140
[40581.814482]  [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20
[40581.814485]  [<ffffffff8188bb70>] ? SyS_sendfile+0x140/0x140
[40581.814487]  [<ffffffff8100924d>] do_syscall_64+0x19d/0x540
[40581.814491]  [<ffffffff82c8af24>] entry_SYSCALL64_slow_path+0x25/0x25
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miklos Szeredi Oct. 24, 2016, 12:57 p.m. UTC | #24
On Fri, Oct 21, 2016 at 5:38 PM, CAI Qian <caiqian@redhat.com> wrote:
>
> ----- Original Message -----
>> From: "Jan Kara" <jack@suse.cz>
>> Sent: Friday, October 7, 2016 3:08:38 AM
>> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked)
>>
>>
>> So I believe this may be just a problem in overlayfs lockdep annotation
>> (see below). Added Miklos to CC.
>>
>> > Wait. There is also a lockep happened before the xfs internal error as
>> > well.
>> >
>> > [ 5839.452325] ======================================================
>> > [ 5839.459221] [ INFO: possible circular locking dependency detected ]
>> > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted
>> > [ 5839.471945] -------------------------------------------------------
>> > [ 5839.478937] trinity-c220/69531 is trying to acquire lock:
>> > [ 5839.484961]  (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>]
>> > seq_read+0x4c/0x3e0
>> > [ 5839.492967]
>> > but task is already holding lock:
>> > [ 5839.499476]  (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>]
>> > __sb_start_write+0xd1/0xf0
>> > [ 5839.508560]
>> > which lock already depends on the new lock.
>> >
>> > [ 5839.517686]
>> > the existing dependency chain (in reverse order) is:
>> > [ 5839.526036]
>> > -> #3 (sb_writers#8){.+.+.+}:
>> > [ 5839.530751]        [<ffffffff810ff174>] lock_acquire+0xd4/0x240
>> > [ 5839.537368]        [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90
>> > [ 5839.544275]        [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0
>> > [ 5839.551181]        [<ffffffff812a8544>] mnt_want_write+0x24/0x50
>> > [ 5839.557892]        [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30
>> > [overlay]
>> > [ 5839.565577]        [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480
>> > [overlay]
>> > [ 5839.573259]        [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay]
>> > [ 5839.580555]        [<ffffffff812918ea>] vfs_unlink+0xda/0x190
>> > [ 5839.586979]        [<ffffffff81293698>] do_unlinkat+0x268/0x2b0
>> > [ 5839.593599]        [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30
>> > [ 5839.600120]        [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0
>> > [ 5839.606836]        [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a
>> > [ 5839.614231]
>>
>> So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection
>> through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in
>> I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we
>> grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then
>> in ovl_want_write() we grab freeze protection again, but this time for the
>> upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT
>> (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering
>> (we maintain locking classes per fs type so that's why I'm showing fs type
>> in parenthesis).
>>
>> Now this nesting is nasty because once you add locks that are not tracked
>> per fs type into the mix, you get cycles. In this case we've got
>> seq_file->lock and cred_guard_mutex into the mix - the splice path is
>> doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing
>> from seq_file into the real filesystem). Exec path further establishes
>> cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle:
>>
>> sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex
>> (overlay) -> sb_writers (FS-A)
>>
>> If I analyzed the lockdep trace, this looks like a real (although remote)
>> deadlock possibility. Miklos?

Yeah, you can leave out seq_file->lock, the chain can be made up from
just 3 parts:

unlink : i_mutex(ov) -> sb_writers(fs-a)
splice: sb_writers(fs-a) ->cred_guard_mutex (though proc_tgid_io_accounting)
exec:  cred_guard_mutex -> i_mutex(ov)

None of those are incorrect, but the cred_guard_mutex usage is also
pretty weird: taken outside path lookup as well as inside ->read() in
proc.

Doesn't look a serious worry in practice, I don't think anybody would
trigger the actual deadlock in a 1000years (an fs freeze is needed at
just the right moment in addition to the above, very unlikely chain).

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d588d14..489d2d6 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -400,23 +400,6 @@  static const struct file_operations proc_pid_cmdline_ops = {
 	.llseek	= generic_file_llseek,
 };
 
-static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns,
-			 struct pid *pid, struct task_struct *task)
-{
-	struct mm_struct *mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
-	if (mm && !IS_ERR(mm)) {
-		unsigned int nwords = 0;
-		do {
-			nwords += 2;
-		} while (mm->saved_auxv[nwords - 2] != 0); /* AT_NULL */
-		seq_write(m, mm->saved_auxv, nwords * sizeof(mm->saved_auxv[0]));
-		mmput(mm);
-		return 0;
-	} else
-		return PTR_ERR(mm);
-}
-
-
 #ifdef CONFIG_KALLSYMS
 /*
  * Provides a wchan file via kallsyms in a proper one-value-per-file format.
@@ -1014,6 +997,30 @@  static const struct file_operations proc_environ_operations = {
 	.release	= mem_release,
 };
 
+static int auxv_open(struct inode *inode, struct file *file)
+{
+	return __mem_open(inode, file, PTRACE_MODE_READ_FSCREDS);
+}
+
+static ssize_t auxv_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct mm_struct *mm = file->private_data;
+	unsigned int nwords = 0;
+	do {
+		nwords += 2;
+	} while (mm->saved_auxv[nwords - 2] != 0); /* AT_NULL */
+	return simple_read_from_buffer(buf, count, ppos, mm->saved_auxv,
+				       nwords * sizeof(mm->saved_auxv[0]));
+}
+
+static const struct file_operations proc_auxv_operations = {
+	.open		= auxv_open,
+	.read		= auxv_read,
+	.llseek		= generic_file_llseek,
+	.release	= mem_release,
+};
+
 static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
 			    loff_t *ppos)
 {
@@ -2822,7 +2829,7 @@  static const struct pid_entry tgid_base_stuff[] = {
 	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
 #endif
 	REG("environ",    S_IRUSR, proc_environ_operations),
-	ONE("auxv",       S_IRUSR, proc_pid_auxv),
+	REG("auxv",       S_IRUSR, proc_auxv_operations),
 	ONE("status",     S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	  S_IRUGO, proc_pid_limits),
@@ -3210,7 +3217,7 @@  static const struct pid_entry tid_base_stuff[] = {
 	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
 #endif
 	REG("environ",   S_IRUSR, proc_environ_operations),
-	ONE("auxv",      S_IRUSR, proc_pid_auxv),
+	REG("auxv",      S_IRUSR, proc_auxv_operations),
 	ONE("status",    S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	 S_IRUGO, proc_pid_limits),