diff mbox series

xfs: Fix circular locking during xfs inode reclamation

Message ID 20240930034406.7600-1-laoar.shao@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show
Series xfs: Fix circular locking during xfs inode reclamation | expand

Commit Message

Yafang Shao Sept. 30, 2024, 3:44 a.m. UTC
I encountered the following error messages on our test servers:

[ 2553.303035] ======================================================
[ 2553.303692] WARNING: possible circular locking dependency detected
[ 2553.304363] 6.11.0+ #27 Not tainted
[ 2553.304732] ------------------------------------------------------
[ 2553.305398] python/129251 is trying to acquire lock:
[ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs]
[ 2553.307066]
but task is already holding lock:
[ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10
[ 2553.308670]
which lock already depends on the new lock.

[ 2553.309487]
the existing dependency chain (in reverse order) is:
[ 2553.310276]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 2553.310853]        __lock_acquire+0x508/0xba0
[ 2553.311315]        lock_acquire+0xb4/0x2c0
[ 2553.311764]        fs_reclaim_acquire+0xa7/0x100
[ 2553.312231]        __kmalloc_noprof+0xa7/0x430
[ 2553.312668]        xfs_attr_shortform_list+0x8f/0x560 [xfs]
[ 2553.313402]        xfs_attr_list_ilocked+0x82/0x90 [xfs]
[ 2553.314087]        xfs_attr_list+0x78/0xa0 [xfs]
[ 2553.314701]        xfs_vn_listxattr+0x80/0xd0 [xfs]
[ 2553.315354]        vfs_listxattr+0x42/0x80
[ 2553.315782]        listxattr+0x5f/0x100
[ 2553.316181]        __x64_sys_flistxattr+0x5c/0xb0
[ 2553.316660]        x64_sys_call+0x1946/0x20d0
[ 2553.317118]        do_syscall_64+0x6c/0x180
[ 2553.317540]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2553.318116]
-> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
[ 2553.318802]        check_prev_add+0xed/0xcc0
[ 2553.319251]        validate_chain+0x535/0x840
[ 2553.319693]        __lock_acquire+0x508/0xba0
[ 2553.320155]        lock_acquire+0xb4/0x2c0
[ 2553.320560]        down_read_nested+0x36/0x170
[ 2553.321028]        xfs_ilock+0x70/0x190 [xfs]
[ 2553.321625]        xfs_can_free_eofblocks+0xd1/0x170 [xfs]
[ 2553.322327]        xfs_inode_needs_inactive+0x97/0xd0 [xfs]
[ 2553.323010]        xfs_inode_mark_reclaimable+0x81/0xd0 [xfs]
[ 2553.323694]        xfs_fs_destroy_inode+0xb7/0x150 [xfs]
[ 2553.324356]        destroy_inode+0x3e/0x80
[ 2553.325064]        evict+0x1e5/0x2f0
[ 2553.325607]        dispose_list+0x4d/0x70
[ 2553.326261]        prune_icache_sb+0x5c/0x90
[ 2553.326870]        super_cache_scan+0x15b/0x1d0
[ 2553.327476]        do_shrink_slab+0x157/0x6a0
[ 2553.328098]        shrink_slab_memcg+0x260/0x5d0
[ 2553.328747]        shrink_slab+0x2a3/0x360
[ 2553.329352]        shrink_node_memcgs+0x1eb/0x260
[ 2553.329995]        shrink_node+0x108/0x430
[ 2553.330551]        shrink_zones.constprop.0+0x89/0x2a0
[ 2553.331230]        do_try_to_free_pages+0x4c/0x2f0
[ 2553.331850]        try_to_free_pages+0xfc/0x2c0
[ 2553.332416]        __alloc_pages_slowpath.constprop.0+0x39c/0xb10
[ 2553.333172]        __alloc_pages_noprof+0x3a1/0x3d0
[ 2553.333847]        alloc_pages_mpol_noprof+0xd9/0x1e0
[ 2553.334499]        vma_alloc_folio_noprof+0x64/0xd0
[ 2553.335159]        alloc_anon_folio+0x1b3/0x390
[ 2553.335757]        do_anonymous_page+0x71/0x5b0
[ 2553.336355]        handle_pte_fault+0x225/0x230
[ 2553.337019]        __handle_mm_fault+0x31b/0x760
[ 2553.337760]        handle_mm_fault+0x12a/0x330
[ 2553.339313]        do_user_addr_fault+0x219/0x7b0
[ 2553.340149]        exc_page_fault+0x6d/0x210
[ 2553.340780]        asm_exc_page_fault+0x27/0x30
[ 2553.341358]
other info that might help us debug this:

[ 2553.342664]  Possible unsafe locking scenario:

[ 2553.343621]        CPU0                    CPU1
[ 2553.344300]        ----                    ----
[ 2553.344957]   lock(fs_reclaim);
[ 2553.345510]                                lock(&xfs_nondir_ilock_class);
[ 2553.346326]                                lock(fs_reclaim);
[ 2553.347015]   rlock(&xfs_nondir_ilock_class);
[ 2553.347639]
 *** DEADLOCK ***

The deadlock is as follows,

    CPU0                                  CPU1
   ------                                ------

  alloc_anon_folio()
    vma_alloc_folio(__GFP_FS)
     fs_reclaim_acquire(__GFP_FS);
       __fs_reclaim_acquire();

                                    xfs_attr_list()
                                      xfs_ilock()
                                      kmalloc(__GFP_FS);
                                        __fs_reclaim_acquire();

       xfs_ilock

To prevent circular locking, we should use GFP_NOFS instead of GFP_KERNEL
in xfs_attr_shortform_list().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/xfs_attr_list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dave Chinner Sept. 30, 2024, 10:53 p.m. UTC | #1
On Mon, Sep 30, 2024 at 11:44:06AM +0800, Yafang Shao wrote:
> I encountered the following error messages on our test servers:
> 
> [ 2553.303035] ======================================================
> [ 2553.303692] WARNING: possible circular locking dependency detected
> [ 2553.304363] 6.11.0+ #27 Not tainted
> [ 2553.304732] ------------------------------------------------------
> [ 2553.305398] python/129251 is trying to acquire lock:
> [ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs]
> [ 2553.307066]
> but task is already holding lock:
> [ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10
> [ 2553.308670]
> which lock already depends on the new lock.

.....

> [ 2553.342664]  Possible unsafe locking scenario:
> 
> [ 2553.343621]        CPU0                    CPU1
> [ 2553.344300]        ----                    ----
> [ 2553.344957]   lock(fs_reclaim);
> [ 2553.345510]                                lock(&xfs_nondir_ilock_class);
> [ 2553.346326]                                lock(fs_reclaim);
> [ 2553.347015]   rlock(&xfs_nondir_ilock_class);
> [ 2553.347639]
>  *** DEADLOCK ***
> 
> The deadlock is as follows,
> 
>     CPU0                                  CPU1
>    ------                                ------
> 
>   alloc_anon_folio()
>     vma_alloc_folio(__GFP_FS)
>      fs_reclaim_acquire(__GFP_FS);
>        __fs_reclaim_acquire();
> 
>                                     xfs_attr_list()
>                                       xfs_ilock()
>                                       kmalloc(__GFP_FS);
>                                         __fs_reclaim_acquire();
> 
>        xfs_ilock

Yet another lockdep false positive. listxattr() is not in a
transaction context on a referenced inode, so GFP_KERNEL is correct.
The problem is lockdep has no clue that fs_reclaim context can only
lock unreferenced inodes, so we can actualy run GFP_KERNEL context
memory allocation with a locked, referenced inode safely.

We typically use __GFP_NOLOCKDEP on these sorts of allocations, but
the long term fix is to address the lockdep annotations to take
reclaim context into account. We can't do that until the realtime
inode subclasses are removed which will give use the spare lockdep
subclasses to add a reclaim context subclass. That is buried in the
middle of a much large rework:

https://lore.kernel.org/linux-xfs/172437087542.59588.13853236455832390956.stgit@frogsfrogsfrogs/

-Dave.
Yafang Shao Oct. 5, 2024, 6:24 a.m. UTC | #2
On Tue, Oct 1, 2024 at 6:53 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Sep 30, 2024 at 11:44:06AM +0800, Yafang Shao wrote:
> > I encountered the following error messages on our test servers:
> >
> > [ 2553.303035] ======================================================
> > [ 2553.303692] WARNING: possible circular locking dependency detected
> > [ 2553.304363] 6.11.0+ #27 Not tainted
> > [ 2553.304732] ------------------------------------------------------
> > [ 2553.305398] python/129251 is trying to acquire lock:
> > [ 2553.305940] ffff89b18582e318 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x70/0x190 [xfs]
> > [ 2553.307066]
> > but task is already holding lock:
> > [ 2553.307682] ffffffffb4324de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x368/0xb10
> > [ 2553.308670]
> > which lock already depends on the new lock.
>
> .....
>
> > [ 2553.342664]  Possible unsafe locking scenario:
> >
> > [ 2553.343621]        CPU0                    CPU1
> > [ 2553.344300]        ----                    ----
> > [ 2553.344957]   lock(fs_reclaim);
> > [ 2553.345510]                                lock(&xfs_nondir_ilock_class);
> > [ 2553.346326]                                lock(fs_reclaim);
> > [ 2553.347015]   rlock(&xfs_nondir_ilock_class);
> > [ 2553.347639]
> >  *** DEADLOCK ***
> >
> > The deadlock is as follows,
> >
> >     CPU0                                  CPU1
> >    ------                                ------
> >
> >   alloc_anon_folio()
> >     vma_alloc_folio(__GFP_FS)
> >      fs_reclaim_acquire(__GFP_FS);
> >        __fs_reclaim_acquire();
> >
> >                                     xfs_attr_list()
> >                                       xfs_ilock()
> >                                       kmalloc(__GFP_FS);
> >                                         __fs_reclaim_acquire();
> >
> >        xfs_ilock
>
> Yet another lockdep false positive. listxattr() is not in a
> transaction context on a referenced inode, so GFP_KERNEL is correct.
> The problem is lockdep has no clue that fs_reclaim context can only
> lock unreferenced inodes, so we can actualy run GFP_KERNEL context
> memory allocation with a locked, referenced inode safely.

Thanks for your detailed explanation.

>
> We typically use __GFP_NOLOCKDEP on these sorts of allocations, but
> the long term fix is to address the lockdep annotations to take
> reclaim context into account. We can't do that until the realtime
> inode subclasses are removed which will give use the spare lockdep
> subclasses to add a reclaim context subclass. That is buried in the
> middle of a much large rework:
>
> https://lore.kernel.org/linux-xfs/172437087542.59588.13853236455832390956.stgit@frogsfrogsfrogs/

Thank you for the reference link. While I’m not able to review the
patchset in detail, I’ll read through it to gain more understanding.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 7db386304875..0dc4600010b8 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -114,7 +114,7 @@  xfs_attr_shortform_list(
 	 * It didn't all fit, so we have to sort everything on hashval.
 	 */
 	sbsize = sf->count * sizeof(*sbuf);
-	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
+	sbp = sbuf = kmalloc(sbsize, GFP_NOFS | __GFP_NOFAIL);
 
 	/*
 	 * Scan the attribute list for the rest of the entries, storing