diff mbox series

[5/5] xfs: check deferred refcount op continuation parameters

Message ID 166664721743.2690245.17086652152508491843.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: improve runtime refcountbt corruption detection | expand

Commit Message

Darrick J. Wong Oct. 24, 2022, 9:33 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent.  Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.

If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block.  If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c |   48 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

Comments

Dave Chinner Oct. 26, 2022, 12:46 a.m. UTC | #1
On Mon, Oct 24, 2022 at 02:33:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If we're in the middle of a deferred refcount operation and decide to
> roll the transaction to avoid overflowing the transaction space, we need
> to check the new agbno/aglen parameters that we're about to record in
> the new intent.  Specifically, we need to check that the new extent is
> completely within the filesystem, and that continuation does not put us
> into a different AG.

Huh. Why would they not be withing the filesystem or AG, given that
the current transaction should only be operating on an extent within
the current filesystem/AG?

IIUC, this is code intended to catch the sort of refcount irec
startblock corruption that the series fixes, right? If so, shouldn't it be
first in the patch series, not last?

-Dave.
Darrick J. Wong Oct. 26, 2022, 1:28 a.m. UTC | #2
On Wed, Oct 26, 2022 at 11:46:03AM +1100, Dave Chinner wrote:
> On Mon, Oct 24, 2022 at 02:33:37PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If we're in the middle of a deferred refcount operation and decide to
> > roll the transaction to avoid overflowing the transaction space, we need
> > to check the new agbno/aglen parameters that we're about to record in
> > the new intent.  Specifically, we need to check that the new extent is
> > completely within the filesystem, and that continuation does not put us
> > into a different AG.
> 
> Huh. Why would they not be withing the filesystem or AG, given that
> the current transaction should only be operating on an extent within
> the current filesystem/AG?

Math errors.

Callers of xfs_refcount_adjust_extents are supposed to split (and
update) any refcount records that cross the start or end of the range
that we're adjusting.  This guarantees that _adjust_extents will stop at
exactly the end of a refcount record.

However, if a record in the middle of that range has a blockcount that
is longer than *aglen at the point that we encounter the record, we'll
increment agbno beyond the range that we were supposed to adjust.  This
happens either because we fuzzed the refcountbt deliberately, or ...
because we actually did write the refcountbt record block but due to
bugs on the vm host, the write was silently dropped and memory pressure
caused the xfs_buf to get reclaimed and reloaded with stale contents.

(That's an argument for making xfs_refcount_adjust_extents check that
*aglen never underflows; I'll update the patch.

Oh.  I already wrote that patch, but forgot to add it to this series.
Sigh.)

If agbno is now large enough that the segmented xfs_fsblock_t points
into a nonexistent AG, bad things will happen.

> IIUC, this is code intended to catch the sort of refcount irec
> startblock corruption that the series fixes, right? If so, shouldn't it be
> first in the patch series, not last?

<shrug> Based on reviewer feedback over the last few years I got in the
habit of putting the actual bug fixes at the front of the series.
Patches adding more layers of checking so that we can die gracefully
ended up after that.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 3e6cc1777ffb..a311851a627a 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1180,6 +1180,44 @@  xfs_refcount_finish_one_cleanup(
 		xfs_trans_brelse(tp, agbp);
 }
 
+/*
+ * Set up a continuation a deferred refcount operation by updating the intent.
+ * Checks to make sure we're not going to run off the end of the AG.
+ */
+static inline int
+xfs_refcount_continue_op(
+	struct xfs_btree_cur		*cur,
+	xfs_fsblock_t			startblock,
+	xfs_agblock_t			new_agbno,
+	xfs_extlen_t			new_len,
+	xfs_fsblock_t			*fsbp)
+{
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_perag		*pag = cur->bc_ag.pag;
+	xfs_fsblock_t			new_fsbno;
+	xfs_agnumber_t			old_agno;
+
+	old_agno = XFS_FSB_TO_AGNO(mp, startblock);
+	new_fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+
+	/*
+	 * If we don't have any work left to do, then there's no need
+	 * to perform the validation of the new parameters.
+	 */
+	if (!new_len)
+		goto done;
+
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, new_fsbno, new_len)))
+		return -EFSCORRUPTED;
+
+	if (XFS_IS_CORRUPT(mp, old_agno != XFS_FSB_TO_AGNO(mp, new_fsbno)))
+		return -EFSCORRUPTED;
+
+done:
+	*fsbp = new_fsbno;
+	return 0;
+}
+
 /*
  * Process one of the deferred refcount operations.  We pass back the
  * btree cursor to maintain our lock on the btree between calls.
@@ -1247,12 +1285,18 @@  xfs_refcount_finish_one(
 	case XFS_REFCOUNT_INCREASE:
 		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
 				new_len, XFS_REFCOUNT_ADJUST_INCREASE);
-		*new_fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+		if (error)
+			goto out_drop;
+		error = xfs_refcount_continue_op(rcur, startblock, new_agbno,
+				*new_len, new_fsb);
 		break;
 	case XFS_REFCOUNT_DECREASE:
 		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
 				new_len, XFS_REFCOUNT_ADJUST_DECREASE);
-		*new_fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+		if (error)
+			goto out_drop;
+		error = xfs_refcount_continue_op(rcur, startblock, new_agbno,
+				*new_len, new_fsb);
 		break;
 	case XFS_REFCOUNT_ALLOC_COW:
 		*new_fsb = startblock + blockcount;