diff mbox series

xfs: don't commit sunit/swidth updates to disk if that would cause repair failures

Message ID 20191204170340.GR7335@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: don't commit sunit/swidth updates to disk if that would cause repair failures | expand

Commit Message

Darrick J. Wong Dec. 4, 2019, 5:03 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly.  The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry.  The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.

Port the computation code from xfs_repair and teach mount to avoid the
ondisk update if it would cause problems for repair.  We allow the mount
to proceed (and new allocations will reflect this new geometry) because
we've never screened this kind of thing before.

[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d

Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: compute the root inode location directly
---
 fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.h |    1 +
 fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
 3 files changed, 115 insertions(+), 18 deletions(-)

Comments

Brian Foster Dec. 5, 2019, 2:36 p.m. UTC | #1
On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> and swidth values could cause xfs_repair to fail loudly.  The problem
> here is that repair calculates the where mkfs should have allocated the
> root inode, based on the superblock geometry.  The allocation decisions
> depend on sunit, which means that we really can't go updating sunit if
> it would lead to a subsequent repair failure on an otherwise correct
> filesystem.
> 
> Port the computation code from xfs_repair and teach mount to avoid the
> ondisk update if it would cause problems for repair.  We allow the mount
> to proceed (and new allocations will reflect this new geometry) because
> we've never screened this kind of thing before.
> 
> [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> 
> Reported-by: Alex Lyakas <alex@zadara.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: compute the root inode location directly
> ---
>  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ialloc.h |    1 +
>  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
>  3 files changed, 115 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 988cde7744e6..6df9bcc96251 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2909,3 +2909,84 @@ xfs_ialloc_setup_geometry(
>  	else
>  		igeo->ialloc_align = 0;
>  }
> +
> +/*
> + * Compute the location of the root directory inode that is laid out by mkfs.
> + * The @sunit parameter will be copied from the superblock if it is negative.
> + */
> +xfs_ino_t
> +xfs_ialloc_calc_rootino(
> +	struct xfs_mount	*mp,
> +	int			sunit)
> +{
> +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> +	xfs_agino_t		first_agino;
> +	xfs_agblock_t		first_bno;
> +
> +	if (sunit < 0)
> +		sunit = mp->m_sb.sb_unit;
> +
> +	/*
> +	 * Pre-calculate the geometry of ag 0. We know what it looks like
> +	 * because we know what mkfs does: 2 allocation btree roots (by block
> +	 * and by size), the inode allocation btree root, the free inode
> +	 * allocation btree root (if enabled) and some number of blocks to
> +	 * prefill the agfl.
> +	 *
> +	 * Because the current shape of the btrees may differ from the current
> +	 * shape, we open code the mkfs freelist block count here. mkfs creates
> +	 * single level trees, so the calculation is pretty straight forward for
> +	 * the trees that use the AGFL.
> +	 */
> +

I know this code is lifted from userspace, but.. "the current shape of
the btrees may differ from the current shape, .." Eh?

> +	/* free space by block btree root comes after the ag headers */
> +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> +
> +	/* free space by length btree root */
> +	first_bno += 1;
> +
> +	/* inode btree root */
> +	first_bno += 1;
> +
> +	/* agfl */
> +	first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
> +

This is a little subtle from the userspace code. The extra +1 here is
where we go from pointing at metadata blocks (i.e. bnobt root) to the
first free block (i.e., past metadata blocks), right? If so, I wonder if
this should be incorporated either at the beginning or end with a
comment for explanation (i.e. "Start by pointing at the first block
after the AG headers and increment by size of applicable metadata to
locate the first free block ...").

I'm guessing this is a historical artifact of the userspace code as
features were added. The fact that userspace uses different variables
somewhat helps self-document in that context, and we lose that here.

> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		first_bno++;
> +
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		first_bno++;
> +		/* agfl blocks */
> +		first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
> +	}
> +
> +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> +		first_bno++;
> +
> +	/*
> +	 * If the log is allocated in the first allocation group we need to
> +	 * add the number of blocks used by the log to the above calculation.
> +	 *
> +	 * This can happens with filesystems that only have a single
> +	 * allocation group, or very odd geometries created by old mkfs
> +	 * versions on very small filesystems.
> +	 */
> +	if (mp->m_sb.sb_logstart &&
> +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> +		 first_bno += mp->m_sb.sb_logblocks;
> +
> +	/*
> +	 * ditto the location of the first inode chunks in the fs ('/')
> +	 */
> +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> +		first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
> +	} else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> +		   mp->m_sb.sb_inoalignmt > 1)  {
> +		first_agino = XFS_AGB_TO_AGINO(mp,
> +				roundup(first_bno, mp->m_sb.sb_inoalignmt));
> +	} else  {
> +		first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> +	}
> +
> +	return XFS_AGINO_TO_INO(mp, 0, first_agino);
> +}
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 323592d563d5..72b3468b97b1 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
>  
>  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
>  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
>  
>  #endif	/* __XFS_IALLOC_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index fca65109cf24..a4eb3ae34a84 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -363,9 +363,10 @@ xfs_readsb(
>   * Update alignment values based on mount options and sb values
>   */
>  STATIC int
> -xfs_update_alignment(xfs_mount_t *mp)
> +xfs_update_alignment(
> +	struct xfs_mount	*mp)
>  {
> -	xfs_sb_t	*sbp = &(mp->m_sb);
> +	struct xfs_sb		*sbp = &mp->m_sb;
>  
>  	if (mp->m_dalign) {
>  		/*
> @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
>  			}
>  		}
>  
> -		/*
> -		 * Update superblock with new values
> -		 * and log changes
> -		 */
> -		if (xfs_sb_version_hasdalign(sbp)) {
> -			if (sbp->sb_unit != mp->m_dalign) {
> -				sbp->sb_unit = mp->m_dalign;
> -				mp->m_update_sb = true;
> -			}
> -			if (sbp->sb_width != mp->m_swidth) {
> -				sbp->sb_width = mp->m_swidth;
> -				mp->m_update_sb = true;
> -			}
> -		} else {
> +		/* Update superblock with new values and log changes. */
> +		if (!xfs_sb_version_hasdalign(sbp)) {
>  			xfs_warn(mp,
>  	"cannot change alignment: superblock does not support data alignment");
>  			return -EINVAL;
>  		}
> +
> +		if (sbp->sb_unit == mp->m_dalign &&
> +		    sbp->sb_width == mp->m_swidth)
> +			return 0;
> +
> +		/*
> +		 * If the sunit/swidth change would move the precomputed root
> +		 * inode value, we must reject the ondisk change because repair
> +		 * will stumble over that.  However, we allow the mount to
> +		 * proceed because we never rejected this combination before.
> +		 */
> +		if (sbp->sb_rootino !=
> +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> +			xfs_warn(mp,
> +	"cannot change stripe alignment: would require moving root inode");
> +

FWIW, I read this error message as the mount option was ignored. I don't
much care whether we ignore the mount option or simply the on-disk
update, but the error could be a bit more clear in the latter case.
Also, what is the expected behavior for xfs_info in the latter
situation?

Brian

> +			/*
> +			 * XXX: Next time we add a new incompat feature, this
> +			 * should start returning -EINVAL.
> +			 */
> +			return 0;
> +		}
> +
> +		sbp->sb_unit = mp->m_dalign;
> +		sbp->sb_width = mp->m_swidth;
> +		mp->m_update_sb = true;
>  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
>  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> -			mp->m_dalign = sbp->sb_unit;
> -			mp->m_swidth = sbp->sb_width;
> +		mp->m_dalign = sbp->sb_unit;
> +		mp->m_swidth = sbp->sb_width;
>  	}
>  
>  	return 0;
>
Darrick J. Wong Dec. 5, 2019, 9:42 p.m. UTC | #2
On Thu, Dec 05, 2019 at 09:36:18AM -0500, Brian Foster wrote:
> On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > and swidth values could cause xfs_repair to fail loudly.  The problem
> > here is that repair calculates the where mkfs should have allocated the
> > root inode, based on the superblock geometry.  The allocation decisions
> > depend on sunit, which means that we really can't go updating sunit if
> > it would lead to a subsequent repair failure on an otherwise correct
> > filesystem.
> > 
> > Port the computation code from xfs_repair and teach mount to avoid the
> > ondisk update if it would cause problems for repair.  We allow the mount
> > to proceed (and new allocations will reflect this new geometry) because
> > we've never screened this kind of thing before.
> > 
> > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> > 
> > Reported-by: Alex Lyakas <alex@zadara.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: compute the root inode location directly
> > ---
> >  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_ialloc.h |    1 +
> >  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
> >  3 files changed, 115 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index 988cde7744e6..6df9bcc96251 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -2909,3 +2909,84 @@ xfs_ialloc_setup_geometry(
> >  	else
> >  		igeo->ialloc_align = 0;
> >  }
> > +
> > +/*
> > + * Compute the location of the root directory inode that is laid out by mkfs.
> > + * The @sunit parameter will be copied from the superblock if it is negative.
> > + */
> > +xfs_ino_t
> > +xfs_ialloc_calc_rootino(
> > +	struct xfs_mount	*mp,
> > +	int			sunit)
> > +{
> > +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> > +	xfs_agino_t		first_agino;
> > +	xfs_agblock_t		first_bno;
> > +
> > +	if (sunit < 0)
> > +		sunit = mp->m_sb.sb_unit;
> > +
> > +	/*
> > +	 * Pre-calculate the geometry of ag 0. We know what it looks like
> > +	 * because we know what mkfs does: 2 allocation btree roots (by block
> > +	 * and by size), the inode allocation btree root, the free inode
> > +	 * allocation btree root (if enabled) and some number of blocks to
> > +	 * prefill the agfl.
> > +	 *
> > +	 * Because the current shape of the btrees may differ from the current
> > +	 * shape, we open code the mkfs freelist block count here. mkfs creates
> > +	 * single level trees, so the calculation is pretty straight forward for
> > +	 * the trees that use the AGFL.
> > +	 */
> > +
> 
> I know this code is lifted from userspace, but.. "the current shape of
> the btrees may differ from the current shape, .." Eh?

Heh, ok, I'll clean up the comment too.

> > +	/* free space by block btree root comes after the ag headers */
> > +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> > +
> > +	/* free space by length btree root */
> > +	first_bno += 1;
> > +
> > +	/* inode btree root */
> > +	first_bno += 1;
> > +
> > +	/* agfl */
> > +	first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
> > +
> 
> This is a little subtle from the userspace code. The extra +1 here is
> where we go from pointing at metadata blocks (i.e. bnobt root) to the
> first free block (i.e., past metadata blocks), right? If so, I wonder if
> this should be incorporated either at the beginning or end with a
> comment for explanation (i.e. "Start by pointing at the first block
> after the AG headers and increment by size of applicable metadata to
> locate the first free block ...").
> 
> I'm guessing this is a historical artifact of the userspace code as
> features were added. The fact that userspace uses different variables
> somewhat helps self-document in that context, and we lose that here.

Yeah.  I'll rework the comments to make it clearer what's going on.

"first_bno is the first block in which we could allocate an inode
chunk after factoring in..."

"...the ag headers"

"...the bnobt / cntbt"

"...the inobt"

etc.

> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > +		first_bno++;
> > +		/* agfl blocks */
> > +		first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
> > +	}
> > +
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	/*
> > +	 * If the log is allocated in the first allocation group we need to
> > +	 * add the number of blocks used by the log to the above calculation.
> > +	 *
> > +	 * This can happens with filesystems that only have a single
> > +	 * allocation group, or very odd geometries created by old mkfs
> > +	 * versions on very small filesystems.
> > +	 */
> > +	if (mp->m_sb.sb_logstart &&
> > +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> > +		 first_bno += mp->m_sb.sb_logblocks;
> > +
> > +	/*
> > +	 * ditto the location of the first inode chunks in the fs ('/')
> > +	 */
> > +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> > +		first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
> > +	} else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> > +		   mp->m_sb.sb_inoalignmt > 1)  {
> > +		first_agino = XFS_AGB_TO_AGINO(mp,
> > +				roundup(first_bno, mp->m_sb.sb_inoalignmt));
> > +	} else  {
> > +		first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> > +	}
> > +
> > +	return XFS_AGINO_TO_INO(mp, 0, first_agino);
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> > index 323592d563d5..72b3468b97b1 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.h
> > +++ b/fs/xfs/libxfs/xfs_ialloc.h
> > @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
> >  
> >  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> >  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> > +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
> >  
> >  #endif	/* __XFS_IALLOC_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index fca65109cf24..a4eb3ae34a84 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -363,9 +363,10 @@ xfs_readsb(
> >   * Update alignment values based on mount options and sb values
> >   */
> >  STATIC int
> > -xfs_update_alignment(xfs_mount_t *mp)
> > +xfs_update_alignment(
> > +	struct xfs_mount	*mp)
> >  {
> > -	xfs_sb_t	*sbp = &(mp->m_sb);
> > +	struct xfs_sb		*sbp = &mp->m_sb;
> >  
> >  	if (mp->m_dalign) {
> >  		/*
> > @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
> >  			}
> >  		}
> >  
> > -		/*
> > -		 * Update superblock with new values
> > -		 * and log changes
> > -		 */
> > -		if (xfs_sb_version_hasdalign(sbp)) {
> > -			if (sbp->sb_unit != mp->m_dalign) {
> > -				sbp->sb_unit = mp->m_dalign;
> > -				mp->m_update_sb = true;
> > -			}
> > -			if (sbp->sb_width != mp->m_swidth) {
> > -				sbp->sb_width = mp->m_swidth;
> > -				mp->m_update_sb = true;
> > -			}
> > -		} else {
> > +		/* Update superblock with new values and log changes. */
> > +		if (!xfs_sb_version_hasdalign(sbp)) {
> >  			xfs_warn(mp,
> >  	"cannot change alignment: superblock does not support data alignment");
> >  			return -EINVAL;
> >  		}
> > +
> > +		if (sbp->sb_unit == mp->m_dalign &&
> > +		    sbp->sb_width == mp->m_swidth)
> > +			return 0;
> > +
> > +		/*
> > +		 * If the sunit/swidth change would move the precomputed root
> > +		 * inode value, we must reject the ondisk change because repair
> > +		 * will stumble over that.  However, we allow the mount to
> > +		 * proceed because we never rejected this combination before.
> > +		 */
> > +		if (sbp->sb_rootino !=
> > +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> > +			xfs_warn(mp,
> > +	"cannot change stripe alignment: would require moving root inode");
> > +
> 
> FWIW, I read this error message as the mount option was ignored. I don't
> much care whether we ignore the mount option or simply the on-disk
> update, but the error could be a bit more clear in the latter case.

Ok, I'll add a message about how we're skipping the sb update.

> Also, what is the expected behavior for xfs_info in the latter
> situation?

A previous revision of the patch had the ioctl feeding xfs_info using
the incore values, but Dave objected so I dropped it.

--D

> Brian
> 
> > +			/*
> > +			 * XXX: Next time we add a new incompat feature, this
> > +			 * should start returning -EINVAL.
> > +			 */
> > +			return 0;
> > +		}
> > +
> > +		sbp->sb_unit = mp->m_dalign;
> > +		sbp->sb_width = mp->m_swidth;
> > +		mp->m_update_sb = true;
> >  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
> >  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> > -			mp->m_dalign = sbp->sb_unit;
> > -			mp->m_swidth = sbp->sb_width;
> > +		mp->m_dalign = sbp->sb_unit;
> > +		mp->m_swidth = sbp->sb_width;
> >  	}
> >  
> >  	return 0;
> > 
>
Brian Foster Dec. 6, 2019, 10:57 a.m. UTC | #3
On Thu, Dec 05, 2019 at 01:42:22PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 05, 2019 at 09:36:18AM -0500, Brian Foster wrote:
> > On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > > and swidth values could cause xfs_repair to fail loudly.  The problem
> > > here is that repair calculates the where mkfs should have allocated the
> > > root inode, based on the superblock geometry.  The allocation decisions
> > > depend on sunit, which means that we really can't go updating sunit if
> > > it would lead to a subsequent repair failure on an otherwise correct
> > > filesystem.
> > > 
> > > Port the computation code from xfs_repair and teach mount to avoid the
> > > ondisk update if it would cause problems for repair.  We allow the mount
> > > to proceed (and new allocations will reflect this new geometry) because
> > > we've never screened this kind of thing before.
> > > 
> > > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> > > 
> > > Reported-by: Alex Lyakas <alex@zadara.com>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > v2: compute the root inode location directly
> > > ---
> > >  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_ialloc.h |    1 +
> > >  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
> > >  3 files changed, 115 insertions(+), 18 deletions(-)
> > > 
...
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index fca65109cf24..a4eb3ae34a84 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
...
> > > @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
> > >  			}
> > >  		}
> > >  
> > > -		/*
> > > -		 * Update superblock with new values
> > > -		 * and log changes
> > > -		 */
> > > -		if (xfs_sb_version_hasdalign(sbp)) {
> > > -			if (sbp->sb_unit != mp->m_dalign) {
> > > -				sbp->sb_unit = mp->m_dalign;
> > > -				mp->m_update_sb = true;
> > > -			}
> > > -			if (sbp->sb_width != mp->m_swidth) {
> > > -				sbp->sb_width = mp->m_swidth;
> > > -				mp->m_update_sb = true;
> > > -			}
> > > -		} else {
> > > +		/* Update superblock with new values and log changes. */
> > > +		if (!xfs_sb_version_hasdalign(sbp)) {
> > >  			xfs_warn(mp,
> > >  	"cannot change alignment: superblock does not support data alignment");
> > >  			return -EINVAL;
> > >  		}
> > > +
> > > +		if (sbp->sb_unit == mp->m_dalign &&
> > > +		    sbp->sb_width == mp->m_swidth)
> > > +			return 0;
> > > +
> > > +		/*
> > > +		 * If the sunit/swidth change would move the precomputed root
> > > +		 * inode value, we must reject the ondisk change because repair
> > > +		 * will stumble over that.  However, we allow the mount to
> > > +		 * proceed because we never rejected this combination before.
> > > +		 */
> > > +		if (sbp->sb_rootino !=
> > > +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> > > +			xfs_warn(mp,
> > > +	"cannot change stripe alignment: would require moving root inode");
> > > +
> > 
> > FWIW, I read this error message as the mount option was ignored. I don't
> > much care whether we ignore the mount option or simply the on-disk
> > update, but the error could be a bit more clear in the latter case.
> 
> Ok, I'll add a message about how we're skipping the sb update.
> 
> > Also, what is the expected behavior for xfs_info in the latter
> > situation?
> 
> A previous revision of the patch had the ioctl feeding xfs_info using
> the incore values, but Dave objected so I dropped it.
> 

Ok, could you document the expected behavior for this new state in the
commit log so it's clear when looking back at it? I.e., xfs_info should
return superblock values, xfs_growfs should update based on superblock
values, etc.

Brian

> --D
> 
> > Brian
> > 
> > > +			/*
> > > +			 * XXX: Next time we add a new incompat feature, this
> > > +			 * should start returning -EINVAL.
> > > +			 */
> > > +			return 0;
> > > +		}
> > > +
> > > +		sbp->sb_unit = mp->m_dalign;
> > > +		sbp->sb_width = mp->m_swidth;
> > > +		mp->m_update_sb = true;
> > >  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
> > >  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> > > -			mp->m_dalign = sbp->sb_unit;
> > > -			mp->m_swidth = sbp->sb_width;
> > > +		mp->m_dalign = sbp->sb_unit;
> > > +		mp->m_swidth = sbp->sb_width;
> > >  	}
> > >  
> > >  	return 0;
> > > 
> > 
>
Darrick J. Wong Dec. 13, 2019, 4:10 p.m. UTC | #4
On Fri, Dec 06, 2019 at 05:57:51AM -0500, Brian Foster wrote:
> On Thu, Dec 05, 2019 at 01:42:22PM -0800, Darrick J. Wong wrote:
> > On Thu, Dec 05, 2019 at 09:36:18AM -0500, Brian Foster wrote:
> > > On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > > > and swidth values could cause xfs_repair to fail loudly.  The problem
> > > > here is that repair calculates the where mkfs should have allocated the
> > > > root inode, based on the superblock geometry.  The allocation decisions
> > > > depend on sunit, which means that we really can't go updating sunit if
> > > > it would lead to a subsequent repair failure on an otherwise correct
> > > > filesystem.
> > > > 
> > > > Port the computation code from xfs_repair and teach mount to avoid the
> > > > ondisk update if it would cause problems for repair.  We allow the mount
> > > > to proceed (and new allocations will reflect this new geometry) because
> > > > we've never screened this kind of thing before.
> > > > 
> > > > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> > > > 
> > > > Reported-by: Alex Lyakas <alex@zadara.com>
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > v2: compute the root inode location directly
> > > > ---
> > > >  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/libxfs/xfs_ialloc.h |    1 +
> > > >  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
> > > >  3 files changed, 115 insertions(+), 18 deletions(-)
> > > > 
> ...
> > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > index fca65109cf24..a4eb3ae34a84 100644
> > > > --- a/fs/xfs/xfs_mount.c
> > > > +++ b/fs/xfs/xfs_mount.c
> ...
> > > > @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
> > > >  			}
> > > >  		}
> > > >  
> > > > -		/*
> > > > -		 * Update superblock with new values
> > > > -		 * and log changes
> > > > -		 */
> > > > -		if (xfs_sb_version_hasdalign(sbp)) {
> > > > -			if (sbp->sb_unit != mp->m_dalign) {
> > > > -				sbp->sb_unit = mp->m_dalign;
> > > > -				mp->m_update_sb = true;
> > > > -			}
> > > > -			if (sbp->sb_width != mp->m_swidth) {
> > > > -				sbp->sb_width = mp->m_swidth;
> > > > -				mp->m_update_sb = true;
> > > > -			}
> > > > -		} else {
> > > > +		/* Update superblock with new values and log changes. */
> > > > +		if (!xfs_sb_version_hasdalign(sbp)) {
> > > >  			xfs_warn(mp,
> > > >  	"cannot change alignment: superblock does not support data alignment");
> > > >  			return -EINVAL;
> > > >  		}
> > > > +
> > > > +		if (sbp->sb_unit == mp->m_dalign &&
> > > > +		    sbp->sb_width == mp->m_swidth)
> > > > +			return 0;
> > > > +
> > > > +		/*
> > > > +		 * If the sunit/swidth change would move the precomputed root
> > > > +		 * inode value, we must reject the ondisk change because repair
> > > > +		 * will stumble over that.  However, we allow the mount to
> > > > +		 * proceed because we never rejected this combination before.
> > > > +		 */
> > > > +		if (sbp->sb_rootino !=
> > > > +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> > > > +			xfs_warn(mp,
> > > > +	"cannot change stripe alignment: would require moving root inode");
> > > > +
> > > 
> > > FWIW, I read this error message as the mount option was ignored. I don't
> > > much care whether we ignore the mount option or simply the on-disk
> > > update, but the error could be a bit more clear in the latter case.
> > 
> > Ok, I'll add a message about how we're skipping the sb update.
> > 
> > > Also, what is the expected behavior for xfs_info in the latter
> > > situation?
> > 
> > A previous revision of the patch had the ioctl feeding xfs_info using
> > the incore values, but Dave objected so I dropped it.
> > 
> 
> Ok, could you document the expected behavior for this new state in the
> commit log so it's clear when looking back at it? I.e., xfs_info should
> return superblock values, xfs_growfs should update based on superblock
> values, etc.

Yeah, I'll do that.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > +			/*
> > > > +			 * XXX: Next time we add a new incompat feature, this
> > > > +			 * should start returning -EINVAL.
> > > > +			 */
> > > > +			return 0;
> > > > +		}
> > > > +
> > > > +		sbp->sb_unit = mp->m_dalign;
> > > > +		sbp->sb_width = mp->m_swidth;
> > > > +		mp->m_update_sb = true;
> > > >  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
> > > >  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> > > > -			mp->m_dalign = sbp->sb_unit;
> > > > -			mp->m_swidth = sbp->sb_width;
> > > > +		mp->m_dalign = sbp->sb_unit;
> > > > +		mp->m_swidth = sbp->sb_width;
> > > >  	}
> > > >  
> > > >  	return 0;
> > > > 
> > > 
> > 
>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 988cde7744e6..6df9bcc96251 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2909,3 +2909,84 @@  xfs_ialloc_setup_geometry(
 	else
 		igeo->ialloc_align = 0;
 }
+
+/*
+ * Compute the location of the root directory inode that is laid out by mkfs.
+ * The @sunit parameter will be copied from the superblock if it is negative.
+ */
+xfs_ino_t
+xfs_ialloc_calc_rootino(
+	struct xfs_mount	*mp,
+	int			sunit)
+{
+	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	xfs_agino_t		first_agino;
+	xfs_agblock_t		first_bno;
+
+	if (sunit < 0)
+		sunit = mp->m_sb.sb_unit;
+
+	/*
+	 * Pre-calculate the geometry of ag 0. We know what it looks like
+	 * because we know what mkfs does: 2 allocation btree roots (by block
+	 * and by size), the inode allocation btree root, the free inode
+	 * allocation btree root (if enabled) and some number of blocks to
+	 * prefill the agfl.
+	 *
+	 * Because the current shape of the btrees may differ from the current
+	 * shape, we open code the mkfs freelist block count here. mkfs creates
+	 * single level trees, so the calculation is pretty straight forward for
+	 * the trees that use the AGFL.
+	 */
+
+	/* free space by block btree root comes after the ag headers */
+	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
+
+	/* free space by length btree root */
+	first_bno += 1;
+
+	/* inode btree root */
+	first_bno += 1;
+
+	/* agfl */
+	first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		first_bno++;
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		first_bno++;
+		/* agfl blocks */
+		first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
+	}
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		first_bno++;
+
+	/*
+	 * If the log is allocated in the first allocation group we need to
+	 * add the number of blocks used by the log to the above calculation.
+	 *
+	 * This can happens with filesystems that only have a single
+	 * allocation group, or very odd geometries created by old mkfs
+	 * versions on very small filesystems.
+	 */
+	if (mp->m_sb.sb_logstart &&
+	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
+		 first_bno += mp->m_sb.sb_logblocks;
+
+	/*
+	 * ditto the location of the first inode chunks in the fs ('/')
+	 */
+	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
+		first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
+	} else if (xfs_sb_version_hasalign(&mp->m_sb) &&
+		   mp->m_sb.sb_inoalignmt > 1)  {
+		first_agino = XFS_AGB_TO_AGINO(mp,
+				roundup(first_bno, mp->m_sb.sb_inoalignmt));
+	} else  {
+		first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
+	}
+
+	return XFS_AGINO_TO_INO(mp, 0, first_agino);
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 323592d563d5..72b3468b97b1 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -152,5 +152,6 @@  int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
 
 int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
 void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
+xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
 
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index fca65109cf24..a4eb3ae34a84 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -363,9 +363,10 @@  xfs_readsb(
  * Update alignment values based on mount options and sb values
  */
 STATIC int
-xfs_update_alignment(xfs_mount_t *mp)
+xfs_update_alignment(
+	struct xfs_mount	*mp)
 {
-	xfs_sb_t	*sbp = &(mp->m_sb);
+	struct xfs_sb		*sbp = &mp->m_sb;
 
 	if (mp->m_dalign) {
 		/*
@@ -398,28 +399,42 @@  xfs_update_alignment(xfs_mount_t *mp)
 			}
 		}
 
-		/*
-		 * Update superblock with new values
-		 * and log changes
-		 */
-		if (xfs_sb_version_hasdalign(sbp)) {
-			if (sbp->sb_unit != mp->m_dalign) {
-				sbp->sb_unit = mp->m_dalign;
-				mp->m_update_sb = true;
-			}
-			if (sbp->sb_width != mp->m_swidth) {
-				sbp->sb_width = mp->m_swidth;
-				mp->m_update_sb = true;
-			}
-		} else {
+		/* Update superblock with new values and log changes. */
+		if (!xfs_sb_version_hasdalign(sbp)) {
 			xfs_warn(mp,
 	"cannot change alignment: superblock does not support data alignment");
 			return -EINVAL;
 		}
+
+		if (sbp->sb_unit == mp->m_dalign &&
+		    sbp->sb_width == mp->m_swidth)
+			return 0;
+
+		/*
+		 * If the sunit/swidth change would move the precomputed root
+		 * inode value, we must reject the ondisk change because repair
+		 * will stumble over that.  However, we allow the mount to
+		 * proceed because we never rejected this combination before.
+		 */
+		if (sbp->sb_rootino !=
+		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
+			xfs_warn(mp,
+	"cannot change stripe alignment: would require moving root inode");
+
+			/*
+			 * XXX: Next time we add a new incompat feature, this
+			 * should start returning -EINVAL.
+			 */
+			return 0;
+		}
+
+		sbp->sb_unit = mp->m_dalign;
+		sbp->sb_width = mp->m_swidth;
+		mp->m_update_sb = true;
 	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
 		    xfs_sb_version_hasdalign(&mp->m_sb)) {
-			mp->m_dalign = sbp->sb_unit;
-			mp->m_swidth = sbp->sb_width;
+		mp->m_dalign = sbp->sb_unit;
+		mp->m_swidth = sbp->sb_width;
 	}
 
 	return 0;