diff mbox series

[3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll

Message ID 166473478893.1083155.2555785331844801316.stgit@magnolia (mailing list archive)
State New, archived
Headers show
Series xfs: fix handling of AG[IF] header buffers during scrub | expand

Commit Message

Darrick J. Wong Oct. 2, 2022, 6:19 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Currently, the only way to lock an allocation group is to hold the AGI
and AGF buffers.  If repair needs to roll the transaction while
repairing some AG metadata, it maintains that lock by holding the two
buffers across the transaction roll and joins them afterwards.

However, repair is not the same as the other parts of XFS that employ
this bhold/bjoin sequence, because it's possible that the AGI or AGF
buffers are not actually dirty before the roll.  In this case, the
buffer log item can detach from the buffer, which means that we have to
re-set the buffer type in the bli after joining the buffer to the new
transaction so that log recovery will know what to do if the fs fails.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/repair.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Dave Chinner Oct. 13, 2022, 10:25 p.m. UTC | #1
On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Currently, the only way to lock an allocation group is to hold the AGI
> and AGF buffers.  If repair needs to roll the transaction while
> repairing some AG metadata, it maintains that lock by holding the two
> buffers across the transaction roll and joins them afterwards.
> 
> However, repair is not the same as the other parts of XFS that employ
> this bhold/bjoin sequence, because it's possible that the AGI or AGF
> buffers are not actually dirty before the roll.  In this case, the
> buffer log item can detach from the buffer, which means that we have to

Doesn't this imply we have a reference counting problem with
XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
reference count on it goes to zero. If the buffer is clean and
joined to a transaction, then that means the only reference to the
BLI is the current transaction. Hence the only way it can get
detached is for the transaction commit to release the current
transaction's reference to the BLI.

Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
prevents ->iop_release from releasing the -buffer- after it drops
the transaction reference to the BLI. That's the problem right there
- xfs_buf_item_release() drops the current trans ref to the clean
item via xfs_buf_item_release() regardless of whether BLI_HOLD is
set or not, hence freeing the BLI on clean buffers.

IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
reference to the BLI as well as the buffer so that we don't free the
BLI for a held clean buffer in xfs_buf_item_release(). The reference
we leave behind will then be picked up by the subsequent call to
xfs_trans_bjoin() which finds the clean BLI already attached to the
buffer...

Cheers,

Dave.
Darrick J. Wong Oct. 13, 2022, 11:19 p.m. UTC | #2
On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Currently, the only way to lock an allocation group is to hold the AGI
> > and AGF buffers.  If repair needs to roll the transaction while
> > repairing some AG metadata, it maintains that lock by holding the two
> > buffers across the transaction roll and joins them afterwards.
> > 
> > However, repair is not the same as the other parts of XFS that employ
> > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > buffers are not actually dirty before the roll.  In this case, the
> > buffer log item can detach from the buffer, which means that we have to
> 
> Doesn't this imply we have a reference counting problem with
> XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> reference count on it goes to zero. If the buffer is clean and
> joined to a transaction, then that means the only reference to the
> BLI is the current transaction. Hence the only way it can get
> detached is for the transaction commit to release the current
> transaction's reference to the BLI.
> 
> Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> prevents ->iop_release from releasing the -buffer- after it drops
> the transaction reference to the BLI. That's the problem right there
> - xfs_buf_item_release() drops the current trans ref to the clean
> item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> set or not, hence freeing the BLI on clean buffers.
> 
> IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> reference to the BLI as well as the buffer so that we don't free the
> BLI for a held clean buffer in xfs_buf_item_release(). The reference
> we leave behind will then be picked up by the subsequent call to
> xfs_trans_bjoin() which finds the clean BLI already attached to the
> buffer...

<nod> I think you're saying that _xfs_trans_bjoin should:

	if (!(bip->bli_flags & XFS_BLI_HOLD))
		atomic_inc(&bip->bli_refcount);

and xfs_buf_item_release should do:

	if (hold)
		return;
	released = xfs_buf_item_put(bip);
	if (stale && !released)
		return;

I'll have to remember how I induced this error in the first place.  I
think it was when I was running repair tests with mem=600M.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Oct. 14, 2022, 1:28 a.m. UTC | #3
On Thu, Oct 13, 2022 at 04:19:15PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> > On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Currently, the only way to lock an allocation group is to hold the AGI
> > > and AGF buffers.  If repair needs to roll the transaction while
> > > repairing some AG metadata, it maintains that lock by holding the two
> > > buffers across the transaction roll and joins them afterwards.
> > > 
> > > However, repair is not the same as the other parts of XFS that employ
> > > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > > buffers are not actually dirty before the roll.  In this case, the
> > > buffer log item can detach from the buffer, which means that we have to
> > 
> > Doesn't this imply we have a reference counting problem with
> > XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> > reference count on it goes to zero. If the buffer is clean and
> > joined to a transaction, then that means the only reference to the
> > BLI is the current transaction. Hence the only way it can get
> > detached is for the transaction commit to release the current
> > transaction's reference to the BLI.
> > 
> > Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> > prevents ->iop_release from releasing the -buffer- after it drops
> > the transaction reference to the BLI. That's the problem right there
> > - xfs_buf_item_release() drops the current trans ref to the clean
> > item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> > set or not, hence freeing the BLI on clean buffers.
> > 
> > IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> > reference to the BLI as well as the buffer so that we don't free the
> > BLI for a held clean buffer in xfs_buf_item_release(). The reference
> > we leave behind will then be picked up by the subsequent call to
> > xfs_trans_bjoin() which finds the clean BLI already attached to the
> > buffer...
> 
> <nod> I think you're saying that _xfs_trans_bjoin should:
> 
> 	if (!(bip->bli_flags & XFS_BLI_HOLD))
> 		atomic_inc(&bip->bli_refcount);
> 
> and xfs_buf_item_release should do:
> 
> 	if (hold)
> 		return;
> 	released = xfs_buf_item_put(bip);
> 	if (stale && !released)
> 		return;

Not exactly. What I was thinking was something like this:

xfs_trans_bhold() should do:

	bip->bli_flags |= XFS_BLI_HOLD;
	atomic_inc(&bip->bli_refcount);

xfs_trans_bhold_release() should do:

	bip->bli_flags &= ~XFS_BLI_HOLD;
	atomic_dec(&bip->bli_refcount);

xfs_trans_brelse() shoudl do:

	if (bip->bli_flags & XFS_BLI_HOLD) {
		bip->bli_flags &= ~XFS_BLI_HOLD;
		atomic_dec(&bip->bli_refcount);
	}

and xfs_buf_item_release() should do:

	if (hold) {
		/* Release the hold ref but not the rolling transaction ref */
		bip->bli_flags &= ~XFS_BLI_HOLD;
		atomic_dec(&bip->bli_refcount);
		return;
	}
	released = xfs_buf_item_put(bip);
	if (stale && !released)
		return;

Then _xfs_trans_bjoin() remains unchanged, as we don't remove the
BLI from the held clean buffer as there is still a reference to it.
The new transaction we rejoin the buffer to will take that
reference.

Cheers,

Dave.
Darrick J. Wong Oct. 24, 2022, 4:16 a.m. UTC | #4
On Fri, Oct 14, 2022 at 12:28:19PM +1100, Dave Chinner wrote:
> On Thu, Oct 13, 2022 at 04:19:15PM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> > > On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Currently, the only way to lock an allocation group is to hold the AGI
> > > > and AGF buffers.  If repair needs to roll the transaction while
> > > > repairing some AG metadata, it maintains that lock by holding the two
> > > > buffers across the transaction roll and joins them afterwards.
> > > > 
> > > > However, repair is not the same as the other parts of XFS that employ
> > > > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > > > buffers are not actually dirty before the roll.  In this case, the
> > > > buffer log item can detach from the buffer, which means that we have to
> > > 
> > > Doesn't this imply we have a reference counting problem with
> > > XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> > > reference count on it goes to zero. If the buffer is clean and
> > > joined to a transaction, then that means the only reference to the
> > > BLI is the current transaction. Hence the only way it can get
> > > detached is for the transaction commit to release the current
> > > transaction's reference to the BLI.
> > > 
> > > Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> > > prevents ->iop_release from releasing the -buffer- after it drops
> > > the transaction reference to the BLI. That's the problem right there
> > > - xfs_buf_item_release() drops the current trans ref to the clean
> > > item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> > > set or not, hence freeing the BLI on clean buffers.
> > > 
> > > IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> > > reference to the BLI as well as the buffer so that we don't free the
> > > BLI for a held clean buffer in xfs_buf_item_release(). The reference
> > > we leave behind will then be picked up by the subsequent call to
> > > xfs_trans_bjoin() which finds the clean BLI already attached to the
> > > buffer...
> > 
> > <nod> I think you're saying that _xfs_trans_bjoin should:
> > 
> > 	if (!(bip->bli_flags & XFS_BLI_HOLD))
> > 		atomic_inc(&bip->bli_refcount);
> > 
> > and xfs_buf_item_release should do:
> > 
> > 	if (hold)
> > 		return;
> > 	released = xfs_buf_item_put(bip);
> > 	if (stale && !released)
> > 		return;
> 
> Not exactly. What I was thinking was something like this:
> 
> xfs_trans_bhold() should do:
> 
> 	bip->bli_flags |= XFS_BLI_HOLD;
> 	atomic_inc(&bip->bli_refcount);
> 
> xfs_trans_bhold_release() should do:
> 
> 	bip->bli_flags &= ~XFS_BLI_HOLD;
> 	atomic_dec(&bip->bli_refcount);
> 
> xfs_trans_brelse() shoudl do:
> 
> 	if (bip->bli_flags & XFS_BLI_HOLD) {
> 		bip->bli_flags &= ~XFS_BLI_HOLD;
> 		atomic_dec(&bip->bli_refcount);
> 	}
> 
> and xfs_buf_item_release() should do:
> 
> 	if (hold) {
> 		/* Release the hold ref but not the rolling transaction ref */
> 		bip->bli_flags &= ~XFS_BLI_HOLD;
> 		atomic_dec(&bip->bli_refcount);
> 		return;
> 	}
> 	released = xfs_buf_item_put(bip);
> 	if (stale && !released)
> 		return;
> 
> Then _xfs_trans_bjoin() remains unchanged, as we don't remove the
> BLI from the held clean buffer as there is still a reference to it.
> The new transaction we rejoin the buffer to will take that
> reference.

Hnmmmm.  I tried this, but it lead to massive memory corruption
problems, slab leak warnings, and a hang.  If I get around to it I'll
try to see if KASAN will help me figure out what went wrong.

It occurred to me that it might be easier to log the first byte of the
AGI and AGF buffers before the transaction roll, which prevents the bli
from falling off the buffer during the transaction rolling.

Yes that isn't fixing the problem that the unusual use of bhold on a
clean buffer doesn't work right, but <shrug> I don't want to increase
the risks of this patchset by reworking buffer/bli lifetimes when I can
do something simpler. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 2ada7fc1c398..92c661b98892 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -138,11 +138,23 @@  xrep_roll_ag_trans(
 	if (error)
 		return error;
 
-	/* Join AG headers to the new transaction. */
-	if (sc->sa.agi_bp)
+	/*
+	 * Join AG headers to the new transaction.  The buffer log item can
+	 * detach from the buffer across the transaction roll if the bli is
+	 * clean, so ensure the buffer type is still set on the AG header
+	 * buffers' blis before we return.
+	 *
+	 * Normal code would never hold a clean buffer across a roll, but scrub
+	 * needs both buffers to maintain a total lock on the AG.
+	 */
+	if (sc->sa.agi_bp) {
 		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
-	if (sc->sa.agf_bp)
+		xfs_trans_buf_set_type(sc->tp, sc->sa.agi_bp, XFS_BLFT_AGI_BUF);
+	}
+	if (sc->sa.agf_bp) {
 		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
+		xfs_trans_buf_set_type(sc->tp, sc->sa.agf_bp, XFS_BLFT_AGF_BUF);
+	}
 
 	return 0;
 }