diff mbox series

[2/6] xfs: check rt summary file geometry more thoroughly

Message ID 170086928377.2771542.14818456920992275639.stgit@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series xfs: online repair of rt bitmap file | expand

Commit Message

Darrick J. Wong Nov. 24, 2023, 11:54 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

I forgot that the xfs_mount tracks the size and number of levels in the
realtime summary file, and that the rt summary file can have more blocks
mapped to the data fork than m_rsumsize implies if growfsrt fails.

So.  Add to the rtsummary scrubber an explicit check that all the
summary geometry values are correct, then adjust the rtsummary i_size
checks to allow for the growfsrt failure case.  Finally, flag post-eof
blocks in the summary file.

While we're at it, split the extent map checking so that we only call
xfs_bmapi_read once per extent instead of once per rtsummary block.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/rtsummary.c |  132 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 105 insertions(+), 27 deletions(-)

Comments

Christoph Hellwig Nov. 28, 2023, 2:05 p.m. UTC | #1
> +	/*
> +	 * Now that we've locked the rtbitmap and rtsummary, we can't race with
> +	 * growfsrt trying to expand the summary or change the size of the rt
> +	 * volume.  Hence it is safe to compute and check the geometry values.
> +	 */
> +	rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
> +	rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
> +	rts->rsumlevels = rts->rextents ? xfs_highbit32(rts->rextents) + 1 : 0;
> +	rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
> +			rts->rbmblocks);
> +	rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);

Same nitpick as for the last patch.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong Nov. 28, 2023, 11:30 p.m. UTC | #2
On Tue, Nov 28, 2023 at 06:05:48AM -0800, Christoph Hellwig wrote:
> > +	/*
> > +	 * Now that we've locked the rtbitmap and rtsummary, we can't race with
> > +	 * growfsrt trying to expand the summary or change the size of the rt
> > +	 * volume.  Hence it is safe to compute and check the geometry values.
> > +	 */
> > +	rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
> > +	rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
> > +	rts->rsumlevels = rts->rextents ? xfs_highbit32(rts->rextents) + 1 : 0;
> > +	rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
> > +			rts->rbmblocks);
> > +	rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
> 
> Same nitpick as for the last patch.

LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
I guess I'll go sort out what's going on there...

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D
Darrick J. Wong Nov. 29, 2023, 1:23 a.m. UTC | #3
On Tue, Nov 28, 2023 at 03:30:09PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 28, 2023 at 06:05:48AM -0800, Christoph Hellwig wrote:
> > > +	/*
> > > +	 * Now that we've locked the rtbitmap and rtsummary, we can't race with
> > > +	 * growfsrt trying to expand the summary or change the size of the rt
> > > +	 * volume.  Hence it is safe to compute and check the geometry values.
> > > +	 */
> > > +	rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
> > > +	rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
> > > +	rts->rsumlevels = rts->rextents ? xfs_highbit32(rts->rextents) + 1 : 0;
> > > +	rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
> > > +			rts->rbmblocks);
> > > +	rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
> > 
> > Same nitpick as for the last patch.
> 
> LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
> I guess I'll go sort out what's going on there...

Oh good, XFS has been broken since the beginning of git history for the
stupid corner case where the rtblocks < rextsize.  In this case, mkfs
will set sb_rextents and sb_rextslog both to zero:

	sbp->sb_rextslog =
		(uint8_t)(rtextents ?
			libxfs_highbit32((unsigned int)rtextents) : 0);

However, that's not the check that xfs_repair uses for nonzero rtblocks:

	if (sb->sb_rextslog !=
			libxfs_highbit32((unsigned int)sb->sb_rextents))

The difference here is that xfs_highbit32 returns -1 if its argument is
zero, which means that for a runt rt volume, repair thinks the "correct"
value of rextslog is -1, even though mkfs wrote it as 0 and flags a
freshly formatted filesystem as corrupt.  Because mkfs has been writing
ondisk artifacts like this for decades, we have to accept that as
"correct".  TBH, zero rextslog for zero rtextents makes more sense to me
anyway.

Regrettably, the superblock verifier checks created in commit copied
xfs_repair even though mkfs has been writing out such filesystems for
ages.  In testing /that/ fix, I discovered that the logic above is
wrong -- rsumlevels is always rextslog + 1 when rblocks > 0, even if
rextents == 0.

IOWs, this logic needs to be:

	/*
	 * Now that we've locked the rtbitmap and rtsummary, we can't race with
	 * growfsrt trying to expand the summary or change the size of the rt
	 * volume.  Hence it is safe to compute and check the geometry values.
	 */
	if (mp->m_sb.sb_rblocks) {
		xfs_filblks_t	rsumblocks;
		int		rextslog = 0;

		rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
		if (rts->rextents)
			rextslog = xfs_highbit32(rts->rextents);
		rts->rsumlevels = rextslog + 1;
		rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
		rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
				rts->rbmblocks);
		rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
	}

Yay winning.

--D

> > Otherwise looks good:
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Thanks!
> 
> --D
>
Christoph Hellwig Nov. 29, 2023, 6:05 a.m. UTC | #4
On Tue, Nov 28, 2023 at 03:30:09PM -0800, Darrick J. Wong wrote:
> LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
> I guess I'll go sort out what's going on there...

I think we should just reject rt device size < rtextsize configs in
the kernel and all tools.
Darrick J. Wong Nov. 29, 2023, 6:21 a.m. UTC | #5
On Tue, Nov 28, 2023 at 10:05:49PM -0800, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 03:30:09PM -0800, Darrick J. Wong wrote:
> > LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
> > I guess I'll go sort out what's going on there...
> 
> I think we should just reject rt device size < rtextsize configs in
> the kernel and all tools.

"But that could break old weirdass customer filesystems."

The design of rtgroups prohibits that, so we're ok going forward.

--D
Christoph Hellwig Nov. 29, 2023, 6:23 a.m. UTC | #6
On Tue, Nov 28, 2023 at 10:21:55PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 28, 2023 at 10:05:49PM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 28, 2023 at 03:30:09PM -0800, Darrick J. Wong wrote:
> > > LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
> > > I guess I'll go sort out what's going on there...
> > 
> > I think we should just reject rt device size < rtextsize configs in
> > the kernel and all tools.
> 
> "But that could break old weirdass customer filesystems."
> 
> The design of rtgroups prohibits that, so we're ok going forward.

Well, as you just said it hasn't mounted for a long time, and really
this is a corner case that just doesn't make any sense.  I'd really
prefer to cleanly reject it, and if someone really complains with a good
reason we can revisit the decisions.  But I strongly doubt it's ever
going to happen.
Darrick J. Wong Nov. 30, 2023, 12:10 a.m. UTC | #7
On Tue, Nov 28, 2023 at 10:23:41PM -0800, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 10:21:55PM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 28, 2023 at 10:05:49PM -0800, Christoph Hellwig wrote:
> > > On Tue, Nov 28, 2023 at 03:30:09PM -0800, Darrick J. Wong wrote:
> > > > LOL so I just tried a 64k rt volume with a 1M rextsize and mkfs crashed.
> > > > I guess I'll go sort out what's going on there...
> > > 
> > > I think we should just reject rt device size < rtextsize configs in
> > > the kernel and all tools.
> > 
> > "But that could break old weirdass customer filesystems."
> > 
> > The design of rtgroups prohibits that, so we're ok going forward.
> 
> Well, as you just said it hasn't mounted for a long time, and really
> this is a corner case that just doesn't make any sense.  I'd really
> prefer to cleanly reject it, and if someone really complains with a good
> reason we can revisit the decisions.  But I strongly doubt it's ever
> going to happen.

Oh, even better, Dave and I noticed today that if you format a 17G
realtime volume (> 2^32 rt extents) then mkfs fails because there's an
integer overflow:

https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c#n3739

Based on your observation that rt free space never exceeds the group
length with rtgroups turned on, I'll tweak the sb_rextslog computation
so that it's computed with (rgblocks / rextsize) instead of (rblocks /
rextsize) which will fix that problem for future filesystems.

--D
diff mbox series

Patch

diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c
index f94800a029f35..41f64158c8626 100644
--- a/fs/xfs/scrub/rtsummary.c
+++ b/fs/xfs/scrub/rtsummary.c
@@ -31,6 +31,18 @@ 
  * (potentially large) amount of data in pageable memory.
  */
 
+struct xchk_rtsummary {
+	struct xfs_rtalloc_args	args;
+
+	uint64_t		rextents;
+	uint64_t		rbmblocks;
+	uint64_t		rsumsize;
+	unsigned int		rsumlevels;
+
+	/* Memory buffer for the summary comparison. */
+	union xfs_suminfo_raw	words[];
+};
+
 /* Set us up to check the rtsummary file. */
 int
 xchk_setup_rtsummary(
@@ -38,8 +50,16 @@  xchk_setup_rtsummary(
 {
 	struct xfs_mount	*mp = sc->mp;
 	char			*descr;
+	struct xchk_rtsummary	*rts;
+	xfs_filblks_t		rsumblocks;
 	int			error;
 
+	rts = kvzalloc(struct_size(rts, words, mp->m_blockwsize),
+			XCHK_GFP_FLAGS);
+	if (!rts)
+		return -ENOMEM;
+	sc->buf = rts;
+
 	/*
 	 * Create an xfile to construct a new rtsummary file.  The xfile allows
 	 * us to avoid pinning kernel memory for this purpose.
@@ -54,11 +74,6 @@  xchk_setup_rtsummary(
 	if (error)
 		return error;
 
-	/* Allocate a memory buffer for the summary comparison. */
-	sc->buf = kvmalloc(mp->m_sb.sb_blocksize, XCHK_GFP_FLAGS);
-	if (!sc->buf)
-		return -ENOMEM;
-
 	error = xchk_install_live_inode(sc, mp->m_rsumip);
 	if (error)
 		return error;
@@ -75,13 +90,23 @@  xchk_setup_rtsummary(
 	 */
 	xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
 	xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
+
+	/*
+	 * Now that we've locked the rtbitmap and rtsummary, we can't race with
+	 * growfsrt trying to expand the summary or change the size of the rt
+	 * volume.  Hence it is safe to compute and check the geometry values.
+	 */
+	rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
+	rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
+	rts->rsumlevels = rts->rextents ? xfs_highbit32(rts->rextents) + 1 : 0;
+	rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
+			rts->rbmblocks);
+	rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
 	return 0;
 }
 
 /* Helper functions to record suminfo words in an xfile. */
 
-typedef unsigned int xchk_rtsumoff_t;
-
 static inline int
 xfsum_load(
 	struct xfs_scrub	*sc,
@@ -192,19 +217,29 @@  STATIC int
 xchk_rtsum_compare(
 	struct xfs_scrub	*sc)
 {
-	struct xfs_rtalloc_args args = {
-		.mp		= sc->mp,
-		.tp		= sc->tp,
-	};
-	struct xfs_mount	*mp = sc->mp;
 	struct xfs_bmbt_irec	map;
-	xfs_fileoff_t		off;
-	xchk_rtsumoff_t		sumoff = 0;
-	int			nmap;
+	struct xfs_iext_cursor	icur;
 
-	for (off = 0; off < XFS_B_TO_FSB(mp, mp->m_rsumsize); off++) {
-		union xfs_suminfo_raw *ondisk_info;
-		int		error = 0;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_inode	*ip = sc->ip;
+	struct xchk_rtsummary	*rts = sc->buf;
+	xfs_fileoff_t		off = 0;
+	xfs_fileoff_t		endoff;
+	xfs_rtsumoff_t		sumoff = 0;
+	int			error = 0;
+
+	rts->args.mp = sc->mp;
+	rts->args.tp = sc->tp;
+
+	/* Mappings may not cross or lie beyond EOF. */
+	endoff = XFS_B_TO_FSB(mp, ip->i_disk_size);
+	if (xfs_iext_lookup_extent(ip, &ip->i_df, endoff, &icur, &map)) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, endoff);
+		return 0;
+	}
+
+	while (off < endoff) {
+		int		nmap = 1;
 
 		if (xchk_should_terminate(sc, &error))
 			return error;
@@ -212,8 +247,7 @@  xchk_rtsum_compare(
 			return 0;
 
 		/* Make sure we have a written extent. */
-		nmap = 1;
-		error = xfs_bmapi_read(mp->m_rsumip, off, 1, &map, &nmap,
+		error = xfs_bmapi_read(ip, off, endoff - off, &map, &nmap,
 				XFS_DATA_FORK);
 		if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
 			return error;
@@ -223,24 +257,33 @@  xchk_rtsum_compare(
 			return 0;
 		}
 
+		off += map.br_blockcount;
+	}
+
+	for (off = 0; off < endoff; off++) {
+		union xfs_suminfo_raw	*ondisk_info;
+
 		/* Read a block's worth of ondisk rtsummary file. */
-		error = xfs_rtsummary_read_buf(&args, off);
+		error = xfs_rtsummary_read_buf(&rts->args, off);
 		if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
 			return error;
 
 		/* Read a block's worth of computed rtsummary file. */
-		error = xfsum_copyout(sc, sumoff, sc->buf, mp->m_blockwsize);
+		error = xfsum_copyout(sc, sumoff, rts->words, mp->m_blockwsize);
 		if (error) {
-			xfs_rtbuf_cache_relse(&args);
+			xfs_rtbuf_cache_relse(&rts->args);
 			return error;
 		}
 
-		ondisk_info = xfs_rsumblock_infoptr(&args, 0);
-		if (memcmp(ondisk_info, sc->buf,
-					mp->m_blockwsize << XFS_WORDLOG) != 0)
+		ondisk_info = xfs_rsumblock_infoptr(&rts->args, 0);
+		if (memcmp(ondisk_info, rts->words,
+					mp->m_blockwsize << XFS_WORDLOG) != 0) {
 			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off);
+			xfs_rtbuf_cache_relse(&rts->args);
+			return error;
+		}
 
-		xfs_rtbuf_cache_relse(&args);
+		xfs_rtbuf_cache_relse(&rts->args);
 		sumoff += mp->m_blockwsize;
 	}
 
@@ -253,8 +296,43 @@  xchk_rtsummary(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_mount	*mp = sc->mp;
+	struct xchk_rtsummary	*rts = sc->buf;
 	int			error = 0;
 
+	/* Is sb_rextents correct? */
+	if (mp->m_sb.sb_rextents != rts->rextents) {
+		xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
+		goto out_rbm;
+	}
+
+	/* Is m_rsumlevels correct? */
+	if (mp->m_rsumlevels != rts->rsumlevels) {
+		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
+		goto out_rbm;
+	}
+
+	/* Is m_rsumsize correct? */
+	if (mp->m_rsumsize != rts->rsumsize) {
+		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
+		goto out_rbm;
+	}
+
+	/* The summary file length must be aligned to an fsblock. */
+	if (mp->m_rsumip->i_disk_size & mp->m_blockmask) {
+		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
+		goto out_rbm;
+	}
+
+	/*
+	 * Is the summary file itself large enough to handle the rt volume?
+	 * growfsrt expands the summary file before updating sb_rextents, so
+	 * the file can be larger than rsumsize.
+	 */
+	if (mp->m_rsumip->i_disk_size < rts->rsumsize) {
+		xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
+		goto out_rbm;
+	}
+
 	/* Invoke the fork scrubber. */
 	error = xchk_metadata_inode_forks(sc);
 	if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))