Message ID | 20240930034406.7600-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | xfs: Fix circular locking during xfs inode reclamation | expand |
On Mon, Sep 30, 2024 at 11:44:06AM +0800, Yafang Shao wrote: > I encountered the following error messages on our test servers: > > [ 2553.303035] ====================================================== > [ 2553.303692] WARNING: possible circular locking dependency detected > [ 2553.304363] 6.11.0+ #27 Not tainted > [ 2553.304732] ------------------------------------------------------ > [ 2553.305398] python/129251 is trying to acquire lock: > [ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs] > [ 2553.307066] > but task is already holding lock: > [ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10 > [ 2553.308670] > which lock already depends on the new lock. ..... > [ 2553.342664] Possible unsafe locking scenario: > > [ 2553.343621] CPU0 CPU1 > [ 2553.344300] ---- ---- > [ 2553.344957] lock(fs_reclaim); > [ 2553.345510] lock(&xfs_nondir_ilock_class); > [ 2553.346326] lock(fs_reclaim); > [ 2553.347015] rlock(&xfs_nondir_ilock_class); > [ 2553.347639] > *** DEADLOCK *** > > The deadlock is as follows, > > CPU0 CPU1 > ------ ------ > > alloc_anon_folio() > vma_alloc_folio(__GFP_FS) > fs_reclaim_acquire(__GFP_FS); > __fs_reclaim_acquire(); > > xfs_attr_list() > xfs_ilock() > kmalloc(__GFP_FS); > __fs_reclaim_acquire(); > > xfs_ilock Yet another lockdep false positive. listxattr() is not in a transaction context on a referenced inode, so GFP_KERNEL is correct. The problem is lockdep has no clue that fs_reclaim context can only lock unreferenced inodes, so we can actualy run GFP_KERNEL context memory allocation with a locked, referenced inode safely. We typically use __GFP_NOLOCKDEP on these sorts of allocations, but the long term fix is to address the lockdep annotations to take reclaim context into account. We can't do that until the realtime inode subclasses are removed which will give use the spare lockdep subclasses to add a reclaim context subclass. That is buried in the middle of a much large rework: https://lore.kernel.org/linux-xfs/172437087542.59588.13853236455832390956.stgit@frogsfrogsfrogs/ -Dave.
On Tue, Oct 1, 2024 at 6:53 AM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Sep 30, 2024 at 11:44:06AM +0800, Yafang Shao wrote: > > I encountered the following error messages on our test servers: > > > > [ 2553.303035] ====================================================== > > [ 2553.303692] WARNING: possible circular locking dependency detected > > [ 2553.304363] 6.11.0+ #27 Not tainted > > [ 2553.304732] ------------------------------------------------------ > > [ 2553.305398] python/129251 is trying to acquire lock: > > [ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs] > > [ 2553.307066] > > but task is already holding lock: > > [ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10 > > [ 2553.308670] > > which lock already depends on the new lock. > > ..... > > > [ 2553.342664] Possible unsafe locking scenario: > > > > [ 2553.343621] CPU0 CPU1 > > [ 2553.344300] ---- ---- > > [ 2553.344957] lock(fs_reclaim); > > [ 2553.345510] lock(&xfs_nondir_ilock_class); > > [ 2553.346326] lock(fs_reclaim); > > [ 2553.347015] rlock(&xfs_nondir_ilock_class); > > [ 2553.347639] > > *** DEADLOCK *** > > > > The deadlock is as follows, > > > > CPU0 CPU1 > > ------ ------ > > > > alloc_anon_folio() > > vma_alloc_folio(__GFP_FS) > > fs_reclaim_acquire(__GFP_FS); > > __fs_reclaim_acquire(); > > > > xfs_attr_list() > > xfs_ilock() > > kmalloc(__GFP_FS); > > __fs_reclaim_acquire(); > > > > xfs_ilock > > Yet another lockdep false positive. listxattr() is not in a > transaction context on a referenced inode, so GFP_KERNEL is correct. > The problem is lockdep has no clue that fs_reclaim context can only > lock unreferenced inodes, so we can actualy run GFP_KERNEL context > memory allocation with a locked, referenced inode safely. Thanks for your detailed explanation. > > We typically use __GFP_NOLOCKDEP on these sorts of allocations, but > the long term fix is to address the lockdep annotations to take > reclaim context into account. We can't do that until the realtime > inode subclasses are removed which will give use the spare lockdep > subclasses to add a reclaim context subclass. That is buried in the > middle of a much large rework: > > https://lore.kernel.org/linux-xfs/172437087542.59588.13853236455832390956.stgit@frogsfrogsfrogs/ Thank you for the reference link. While I’m not able to review the patchset in detail, I’ll read through it to gain more understanding.
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c index 7db386304875..0dc4600010b8 100644 --- a/fs/xfs/xfs_attr_list.c +++ b/fs/xfs/xfs_attr_list.c @@ -114,7 +114,7 @@ xfs_attr_shortform_list( * It didn't all fit, so we have to sort everything on hashval. */ sbsize = sf->count * sizeof(*sbuf); - sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL); + sbp = sbuf = kmalloc(sbsize, GFP_NOFS | __GFP_NOFAIL); /* * Scan the attribute list for the rest of the entries, storing
I encountered the following error messages on our test servers: [ 2553.303035] ====================================================== [ 2553.303692] WARNING: possible circular locking dependency detected [ 2553.304363] 6.11.0+ #27 Not tainted [ 2553.304732] ------------------------------------------------------ [ 2553.305398] python/129251 is trying to acquire lock: [ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs] [ 2553.307066] but task is already holding lock: [ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10 [ 2553.308670] which lock already depends on the new lock. [ 2553.309487] the existing dependency chain (in reverse order) is: [ 2553.310276] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 2553.310853] __lock_acquire+0x508/0xba0 [ 2553.311315] lock_acquire+0xb4/0x2c0 [ 2553.311764] fs_reclaim_acquire+0xa7/0x100 [ 2553.312231] __kmalloc_noprof+0xa7/0x430 [ 2553.312668] xfs_attr_shortform_list+0x8f/0x560 [xfs] [ 2553.313402] xfs_attr_list_ilocked+0x82/0x90 [xfs] [ 2553.314087] xfs_attr_list+0x78/0xa0 [xfs] [ 2553.314701] xfs_vn_listxattr+0x80/0xd0 [xfs] [ 2553.315354] vfs_listxattr+0x42/0x80 [ 2553.315782] listxattr+0x5f/0x100 [ 2553.316181] __x64_sys_flistxattr+0x5c/0xb0 [ 2553.316660] x64_sys_call+0x1946/0x20d0 [ 2553.317118] do_syscall_64+0x6c/0x180 [ 2553.317540] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2553.318116] -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}: [ 2553.318802] check_prev_add+0xed/0xcc0 [ 2553.319251] validate_chain+0x535/0x840 [ 2553.319693] __lock_acquire+0x508/0xba0 [ 2553.320155] lock_acquire+0xb4/0x2c0 [ 2553.320560] down_read_nested+0x36/0x170 [ 2553.321028] xfs_ilock+0x70/0x190 [xfs] [ 2553.321625] xfs_can_free_eofblocks+0xd1/0x170 [xfs] [ 2553.322327] xfs_inode_needs_inactive+0x97/0xd0 [xfs] [ 2553.323010] xfs_inode_mark_reclaimable+0x81/0xd0 [xfs] [ 2553.323694] xfs_fs_destroy_inode+0xb7/0x150 [xfs] [ 2553.324356] destroy_inode+0x3e/0x80 [ 2553.325064] evict+0x1e5/0x2f0 [ 2553.325607] dispose_list+0x4d/0x70 [ 2553.326261] prune_icache_sb+0x5c/0x90 [ 2553.326870] super_cache_scan+0x15b/0x1d0 [ 2553.327476] do_shrink_slab+0x157/0x6a0 [ 2553.328098] shrink_slab_memcg+0x260/0x5d0 [ 2553.328747] shrink_slab+0x2a3/0x360 [ 2553.329352] shrink_node_memcgs+0x1eb/0x260 [ 2553.329995] shrink_node+0x108/0x430 [ 2553.330551] shrink_zones.constprop.0+0x89/0x2a0 [ 2553.331230] do_try_to_free_pages+0x4c/0x2f0 [ 2553.331850] try_to_free_pages+0xfc/0x2c0 [ 2553.332416] __alloc_pages_slowpath.constprop.0+0x39c/0xb10 [ 2553.333172] __alloc_pages_noprof+0x3a1/0x3d0 [ 2553.333847] alloc_pages_mpol_noprof+0xd9/0x1e0 [ 2553.334499] vma_alloc_folio_noprof+0x64/0xd0 [ 2553.335159] alloc_anon_folio+0x1b3/0x390 [ 2553.335757] do_anonymous_page+0x71/0x5b0 [ 2553.336355] handle_pte_fault+0x225/0x230 [ 2553.337019] __handle_mm_fault+0x31b/0x760 [ 2553.337760] handle_mm_fault+0x12a/0x330 [ 2553.339313] do_user_addr_fault+0x219/0x7b0 [ 2553.340149] exc_page_fault+0x6d/0x210 [ 2553.340780] asm_exc_page_fault+0x27/0x30 [ 2553.341358] other info that might help us debug this: [ 2553.342664] Possible unsafe locking scenario: [ 2553.343621] CPU0 CPU1 [ 2553.344300] ---- ---- [ 2553.344957] lock(fs_reclaim); [ 2553.345510] lock(&xfs_nondir_ilock_class); [ 2553.346326] lock(fs_reclaim); [ 2553.347015] rlock(&xfs_nondir_ilock_class); [ 2553.347639] *** DEADLOCK *** The deadlock is as follows, CPU0 CPU1 ------ ------ alloc_anon_folio() vma_alloc_folio(__GFP_FS) fs_reclaim_acquire(__GFP_FS); __fs_reclaim_acquire(); xfs_attr_list() xfs_ilock() kmalloc(__GFP_FS); __fs_reclaim_acquire(); xfs_ilock To prevent circular locking, we should use GFP_NOFS instead of GFP_KERNEL in xfs_attr_shortform_list(). Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- fs/xfs/xfs_attr_list.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)