diff mbox series

xfs: shrink failure needs to hold AGI buffer

Message ID 20240306011246.1631906-1-david@fromorbit.com (mailing list archive)
State Superseded
Headers show
Series xfs: shrink failure needs to hold AGI buffer | expand

Commit Message

Dave Chinner March 6, 2024, 1:12 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Chandan reported a AGI/AGF lock order hang on xfs/168 during recent
testing. The cause of the problem was the task running xfs_growfs
to shrink the filesystem. A failure occurred trying to remove the
free space from the btrees that the shrink would make disappear,
and that meant it ran the error handling for a partial failure.

This error path involves restoring the per-ag block reservations,
and that requires calculating the amount of space needed to be
reserved for the free inode btree. The growfs operation hung here:

[18679.536829]  down+0x71/0xa0
[18679.537657]  xfs_buf_lock+0xa4/0x290 [xfs]
[18679.538731]  xfs_buf_find_lock+0xf7/0x4d0 [xfs]
[18679.539920]  xfs_buf_lookup.constprop.0+0x289/0x500 [xfs]
[18679.542628]  xfs_buf_get_map+0x2b3/0xe40 [xfs]
[18679.547076]  xfs_buf_read_map+0xbb/0x900 [xfs]
[18679.562616]  xfs_trans_read_buf_map+0x449/0xb10 [xfs]
[18679.569778]  xfs_read_agi+0x1cd/0x500 [xfs]
[18679.573126]  xfs_ialloc_read_agi+0xc2/0x5b0 [xfs]
[18679.578708]  xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs]
[18679.582480]  xfs_ag_resv_init+0x2c5/0x490 [xfs]
[18679.586023]  xfs_ag_shrink_space+0x736/0xd30 [xfs]
[18679.590730]  xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs]
[18679.599764]  xfs_growfs_data+0x2f1/0x410 [xfs]
[18679.602212]  xfs_file_ioctl+0xd1e/0x1370 [xfs]

trying to get the AGI lock. The AGI lock was held by a fstress task
trying to do an inode allocation, and it was waiting on the AGF
lock to allocate a new inode chunk on disk. Hence deadlock.

The fix for this is for the growfs code to hold the AGI over the
transaction roll it does in the error path. It already holds the AGF
locked across this, and that is what causes the lock order inversion
in the xfs_ag_resv_init() call.

Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Dave Chinner March 6, 2024, 4:40 a.m. UTC | #1
On Wed, Mar 06, 2024 at 10:33:16AM +0800, Wang Yugui wrote:
> Hi,
> 
> 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Chandan reported a AGI/AGF lock order hang on xfs/168 during recent
> > testing. The cause of the problem was the task running xfs_growfs
> > to shrink the filesystem. A failure occurred trying to remove the
> > free space from the btrees that the shrink would make disappear,
> > and that meant it ran the error handling for a partial failure.
> > 
> > This error path involves restoring the per-ag block reservations,
> > and that requires calculating the amount of space needed to be
> > reserved for the free inode btree. The growfs operation hung here:
> > 
> > [18679.536829]  down+0x71/0xa0
> > [18679.537657]  xfs_buf_lock+0xa4/0x290 [xfs]
> > [18679.538731]  xfs_buf_find_lock+0xf7/0x4d0 [xfs]
> > [18679.539920]  xfs_buf_lookup.constprop.0+0x289/0x500 [xfs]
> > [18679.542628]  xfs_buf_get_map+0x2b3/0xe40 [xfs]
> > [18679.547076]  xfs_buf_read_map+0xbb/0x900 [xfs]
> > [18679.562616]  xfs_trans_read_buf_map+0x449/0xb10 [xfs]
> > [18679.569778]  xfs_read_agi+0x1cd/0x500 [xfs]
> > [18679.573126]  xfs_ialloc_read_agi+0xc2/0x5b0 [xfs]
> > [18679.578708]  xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs]
> > [18679.582480]  xfs_ag_resv_init+0x2c5/0x490 [xfs]
> > [18679.586023]  xfs_ag_shrink_space+0x736/0xd30 [xfs]
> > [18679.590730]  xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs]
> > [18679.599764]  xfs_growfs_data+0x2f1/0x410 [xfs]
> > [18679.602212]  xfs_file_ioctl+0xd1e/0x1370 [xfs]
> > 
> > trying to get the AGI lock. The AGI lock was held by a fstress task
> > trying to do an inode allocation, and it was waiting on the AGF
> > lock to allocate a new inode chunk on disk. Hence deadlock.
> > 
> > The fix for this is for the growfs code to hold the AGI over the
> > transaction roll it does in the error path. It already holds the AGF
> > locked across this, and that is what causes the lock order inversion
> > in the xfs_ag_resv_init() call.
> > 
> > Reported-by: Chandan Babu R <chandanbabu@kernel.org>
> > Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_ag.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index d728709054b2..dc1873f76bff 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -975,14 +975,23 @@ xfs_ag_shrink_space(
> >  
> >  	if (error) {
> >  		/*
> > -		 * if extent allocation fails, need to roll the transaction to
> > +		 * If extent allocation fails, need to roll the transaction to
> >  		 * ensure that the AGFL fixup has been committed anyway.
> > +		 *
> > +		 * We need to hold the AGF across the roll to ensure nothing can
> > +		 * access the AG for allocation until the shrink is fully
> > +		 * cleaned up. And due to the resetting of the AG block
> > +		 * reservation space needing to lock the AGI, we also have to
> > +		 * hold that so we don't get AGI/AGF lock order inversions in
> > +		 * the error handling path.
> >  		 */
> >  		xfs_trans_bhold(*tpp, agfbp);
> > +		xfs_trans_bhold(*tpp, agibp);
> >  		err2 = xfs_trans_roll(tpp);
> >  		if (err2)
> >  			return err2;
> >  		xfs_trans_bjoin(*tpp, agfbp);
> > +		xfs_trans_bjoin(*tpp, agibp);
> >  		goto resv_init_out;
> 
> Should ‘xfs_trans_bjoin(*tpp, agibp)’ be done
> before ‘xfs_trans_bjoin(*tpp, agfbp)’?

It doesn't matter.

-Dave.
Gao Xiang March 6, 2024, 4:54 a.m. UTC | #2
On 2024/3/6 09:12, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Chandan reported a AGI/AGF lock order hang on xfs/168 during recent
> testing. The cause of the problem was the task running xfs_growfs
> to shrink the filesystem. A failure occurred trying to remove the
> free space from the btrees that the shrink would make disappear,
> and that meant it ran the error handling for a partial failure.
> 
> This error path involves restoring the per-ag block reservations,
> and that requires calculating the amount of space needed to be
> reserved for the free inode btree. The growfs operation hung here:
> 
> [18679.536829]  down+0x71/0xa0
> [18679.537657]  xfs_buf_lock+0xa4/0x290 [xfs]
> [18679.538731]  xfs_buf_find_lock+0xf7/0x4d0 [xfs]
> [18679.539920]  xfs_buf_lookup.constprop.0+0x289/0x500 [xfs]
> [18679.542628]  xfs_buf_get_map+0x2b3/0xe40 [xfs]
> [18679.547076]  xfs_buf_read_map+0xbb/0x900 [xfs]
> [18679.562616]  xfs_trans_read_buf_map+0x449/0xb10 [xfs]
> [18679.569778]  xfs_read_agi+0x1cd/0x500 [xfs]
> [18679.573126]  xfs_ialloc_read_agi+0xc2/0x5b0 [xfs]
> [18679.578708]  xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs]
> [18679.582480]  xfs_ag_resv_init+0x2c5/0x490 [xfs]
> [18679.586023]  xfs_ag_shrink_space+0x736/0xd30 [xfs]
> [18679.590730]  xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs]
> [18679.599764]  xfs_growfs_data+0x2f1/0x410 [xfs]
> [18679.602212]  xfs_file_ioctl+0xd1e/0x1370 [xfs]
> 
> trying to get the AGI lock. The AGI lock was held by a fstress task
> trying to do an inode allocation, and it was waiting on the AGF
> lock to allocate a new inode chunk on disk. Hence deadlock.
> 
> The fix for this is for the growfs code to hold the AGI over the
> transaction roll it does in the error path. It already holds the AGF
> locked across this, and that is what causes the lock order inversion
> in the xfs_ag_resv_init() call.
> 
> Reported-by: Chandan Babu R <chandanbabu@kernel.org>
> Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang
Christoph Hellwig March 6, 2024, 12:35 p.m. UTC | #3
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index d728709054b2..dc1873f76bff 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -975,14 +975,23 @@  xfs_ag_shrink_space(
 
 	if (error) {
 		/*
-		 * if extent allocation fails, need to roll the transaction to
+		 * If extent allocation fails, need to roll the transaction to
 		 * ensure that the AGFL fixup has been committed anyway.
+		 *
+		 * We need to hold the AGF across the roll to ensure nothing can
+		 * access the AG for allocation until the shrink is fully
+		 * cleaned up. And due to the resetting of the AG block
+		 * reservation space needing to lock the AGI, we also have to
+		 * hold that so we don't get AGI/AGF lock order inversions in
+		 * the error handling path.
 		 */
 		xfs_trans_bhold(*tpp, agfbp);
+		xfs_trans_bhold(*tpp, agibp);
 		err2 = xfs_trans_roll(tpp);
 		if (err2)
 			return err2;
 		xfs_trans_bjoin(*tpp, agfbp);
+		xfs_trans_bjoin(*tpp, agibp);
 		goto resv_init_out;
 	}