diff mbox series

[2/2] xfs: validate extsz hints against rt extent size when rtinherit is set

Message ID 162152894725.2694219.2966158387963381824.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: strengthen validation of extent size hints | expand

Commit Message

Darrick J. Wong May 20, 2021, 4:42 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

The RTINHERIT bit can be set on a directory so that newly created
regular files will have the REALTIME bit set to store their data on the
realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
the directory, the hint will also be copied into the new file.

As pointed out in previous patches, for realtime files we require the
extent size hint be an integer multiple of the realtime extent, but we
don't perform the same validation on a directory with both RTINHERIT and
EXTSZINHERIT set, even though the only use-case of that combination is
to propagate extent size hints into new realtime files.  This leads to
inode corruption errors when the bad values are propagated.

Because there may be existing filesystems with such a configuration, we
cannot simply amend the inode verifier to trip on these directories and
call it a day because that will cause previously "working" filesystems
to start throwing errors abruptly.  Note that it's valid to have
directories with rtinherit set even if there is no realtime volume, in
which case the problem does not manifest because rtinherit is ignored if
there's no realtime device; and it's possible that someone set the flag,
crashed, repaired the filesystem (which clears the hint on the realtime
file) and continued.

Therefore, mitigate this issue in several ways: First, if we try to
write out an inode with both rtinherit/extszinherit set and an unaligned
extent size hint, we'll simply turn off the hint to correct the error.
Second, if someone tries to misconfigure a file via the fssetxattr
ioctl, we'll fail the ioctl.  Third, we reverify both extent size hint
values when we propagate heritable inode attributes from parent to
child, so that we prevent misconfigurations from spreading.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_inode_buf.c   |   15 ++++++++++++-
 fs/xfs/libxfs/xfs_trans_inode.c |   13 +++++++++++
 fs/xfs/xfs_inode.c              |   46 ++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_ioctl.c              |   14 ++++++++++++
 4 files changed, 74 insertions(+), 14 deletions(-)

Comments

Christoph Hellwig May 21, 2021, 7:49 a.m. UTC | #1
On Thu, May 20, 2021 at 09:42:27AM -0700, Darrick J. Wong wrote:
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 045118c7bf78..dce267dbea5f 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -589,8 +589,21 @@ xfs_inode_validate_extsize(
>  	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
>  	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
>  
> +	/*
> +	 * Historically, XFS didn't check that the extent size hint was an
> +	 * integer multiple of the rt extent size on a directory with both
> +	 * rtinherit and extszinherit flags set.  This results in math errors
> +	 * in the rt allocator and inode verifier errors when the unaligned
> +	 * hint value propagates into new realtime files.  Since there might
> +	 * be filesystems in the wild, the best we can do for now is to
> +	 * mitigate the harms by stopping the propagation.
> +	 *
> +	 * The next time we add a new inode feature, the if test below should
> +	 * also trigger if that new feature is enabled and (rtinherit_flag &&
> +	 * inherit_flag).
> +	 */

I don't understand how this comment relates to the change below, and
in fact I don't understand the last paragraph at all.

>  	if (rt_flag)
> -		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> +		blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);

This just looks like a cleanup, that is replace the open coded version
of XFS_FSB_TO_B with the actual helper.

> +	/*
> +	 * Clear invalid extent size hints set on files with rtinherit and
> +	 * extszinherit set.  See the comments in xfs_inode_validate_extsize
> +	 * for details.
> +	 */

Maybe that comment should be here as it makes a whole lot more sense
where we do the actual clearing.

> +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT);

Overly long line.

> +	xfs_failaddr_t		failaddr;
>  	umode_t			mode = VFS_I(ip)->i_mode;
>  
>  	if (S_ISDIR(mode)) {
>  		if (pip->i_diflags & XFS_DIFLAG_RTINHERIT)
> -			di_flags |= XFS_DIFLAG_RTINHERIT;
> +			ip->i_diflags |= XFS_DIFLAG_RTINHERIT;

The removal of the di_flags seems like an unrelated (though nice)
cleanup.
Darrick J. Wong May 21, 2021, 7:31 p.m. UTC | #2
On Fri, May 21, 2021 at 08:49:03AM +0100, Christoph Hellwig wrote:
> On Thu, May 20, 2021 at 09:42:27AM -0700, Darrick J. Wong wrote:
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index 045118c7bf78..dce267dbea5f 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -589,8 +589,21 @@ xfs_inode_validate_extsize(
> >  	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
> >  	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
> >  
> > +	/*
> > +	 * Historically, XFS didn't check that the extent size hint was an
> > +	 * integer multiple of the rt extent size on a directory with both
> > +	 * rtinherit and extszinherit flags set.  This results in math errors
> > +	 * in the rt allocator and inode verifier errors when the unaligned
> > +	 * hint value propagates into new realtime files.  Since there might
> > +	 * be filesystems in the wild, the best we can do for now is to
> > +	 * mitigate the harms by stopping the propagation.
> > +	 *
> > +	 * The next time we add a new inode feature, the if test below should
> > +	 * also trigger if that new feature is enabled and (rtinherit_flag &&
> > +	 * inherit_flag).
> > +	 */
> 
> I don't understand how this comment relates to the change below, and
> in fact I don't understand the last paragraph at all.

It doesn't relate to the cleanup below at all.  I put a comment there to
explain why this verifier function doesn't check the rextsize alignment
of the hint.  In other words, it's a comment describing a gap in a
validation function and why we can't remove the gap here but have to
sprinkle mitigations everywhere else.

Earlier iterations of this patch simply fixed the verifier and allowed
existing filesystems to fail.  Brian and I weren't totally comfortable
with introducing a breaking change like that even though the
consequences of the bad hint actually propagating into a realtime file
are immediate filesystem corruption shutdowns, so this iteration
mitigates the propagation issue by fixing directories when they get
rewritten and preventing propagation of bad hints into new files.

I don't want to introduce a new feature/inode flag just to say
"corrected rt extent size hint" since having an extent size hint set on
a filesystem with a realtime extent size larger than 1 fsb on a
filesystem with a realtime volume configured at all is a vanishingly
rare condition.

Granted, we could decide to introduce the breaking verifier change and
say "fmeh to this corner case, if you did that you're screwed anyway",
but I think I want a few more ACKs on /that/ strategy before I do that.

> >  	if (rt_flag)
> > -		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> > +		blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
> 
> This just looks like a cleanup, that is replace the open coded version
> of XFS_FSB_TO_B with the actual helper.

Yes.  As I mentioned above, the comment documents code that isn't there
and cannot be added.

> > +	/*
> > +	 * Clear invalid extent size hints set on files with rtinherit and
> > +	 * extszinherit set.  See the comments in xfs_inode_validate_extsize
> > +	 * for details.
> > +	 */
> 
> Maybe that comment should be here as it makes a whole lot more sense
> where we do the actual clearing.

Ok.

> 
> > +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> > +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> > +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> > +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT);
> 
> Overly long line.

Fixed, though Linus said to stop caring about code that goes slightly
over.

> > +	xfs_failaddr_t		failaddr;
> >  	umode_t			mode = VFS_I(ip)->i_mode;
> >  
> >  	if (S_ISDIR(mode)) {
> >  		if (pip->i_diflags & XFS_DIFLAG_RTINHERIT)
> > -			di_flags |= XFS_DIFLAG_RTINHERIT;
> > +			ip->i_diflags |= XFS_DIFLAG_RTINHERIT;
> 
> The removal of the di_flags seems like an unrelated (though nice)
> cleanup.

Ok fine I'll do this bugfix properly and turn the cleanups into new
separate patches and add all three to the 15+ cleanup patches I didn't
manage to get merged last cycle.

--D
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 045118c7bf78..dce267dbea5f 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -589,8 +589,21 @@  xfs_inode_validate_extsize(
 	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
 	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
 
+	/*
+	 * Historically, XFS didn't check that the extent size hint was an
+	 * integer multiple of the rt extent size on a directory with both
+	 * rtinherit and extszinherit flags set.  This results in math errors
+	 * in the rt allocator and inode verifier errors when the unaligned
+	 * hint value propagates into new realtime files.  Since there might
+	 * be filesystems in the wild, the best we can do for now is to
+	 * mitigate the harms by stopping the propagation.
+	 *
+	 * The next time we add a new inode feature, the if test below should
+	 * also trigger if that new feature is enabled and (rtinherit_flag &&
+	 * inherit_flag).
+	 */
 	if (rt_flag)
-		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
+		blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
 	else
 		blocksize_bytes = mp->m_sb.sb_blocksize;
 
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index 78324e043e25..cb5e04ed2654 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -142,6 +142,19 @@  xfs_trans_log_inode(
 		flags |= XFS_ILOG_CORE;
 	}
 
+	/*
+	 * Clear invalid extent size hints set on files with rtinherit and
+	 * extszinherit set.  See the comments in xfs_inode_validate_extsize
+	 * for details.
+	 */
+	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
+	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
+	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
+		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT);
+		ip->i_extsize = 0;
+		flags |= XFS_ILOG_CORE;
+	}
+
 	/*
 	 * Record the specific change for fdatasync optimisation. This allows
 	 * fdatasync to skip log forces for inodes that are only timestamp
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0369eb22c1bb..32458f3f686c 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -689,46 +689,56 @@  xfs_inode_inherit_flags(
 	struct xfs_inode	*ip,
 	const struct xfs_inode	*pip)
 {
-	unsigned int		di_flags = 0;
+	xfs_failaddr_t		failaddr;
 	umode_t			mode = VFS_I(ip)->i_mode;
 
 	if (S_ISDIR(mode)) {
 		if (pip->i_diflags & XFS_DIFLAG_RTINHERIT)
-			di_flags |= XFS_DIFLAG_RTINHERIT;
+			ip->i_diflags |= XFS_DIFLAG_RTINHERIT;
 		if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {
-			di_flags |= XFS_DIFLAG_EXTSZINHERIT;
+			ip->i_diflags |= XFS_DIFLAG_EXTSZINHERIT;
 			ip->i_extsize = pip->i_extsize;
 		}
 		if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT)
-			di_flags |= XFS_DIFLAG_PROJINHERIT;
+			ip->i_diflags |= XFS_DIFLAG_PROJINHERIT;
 	} else if (S_ISREG(mode)) {
 		if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
 		    xfs_sb_version_hasrealtime(&ip->i_mount->m_sb))
-			di_flags |= XFS_DIFLAG_REALTIME;
+			ip->i_diflags |= XFS_DIFLAG_REALTIME;
 		if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {
-			di_flags |= XFS_DIFLAG_EXTSIZE;
+			ip->i_diflags |= XFS_DIFLAG_EXTSIZE;
 			ip->i_extsize = pip->i_extsize;
 		}
 	}
 	if ((pip->i_diflags & XFS_DIFLAG_NOATIME) &&
 	    xfs_inherit_noatime)
-		di_flags |= XFS_DIFLAG_NOATIME;
+		ip->i_diflags |= XFS_DIFLAG_NOATIME;
 	if ((pip->i_diflags & XFS_DIFLAG_NODUMP) &&
 	    xfs_inherit_nodump)
-		di_flags |= XFS_DIFLAG_NODUMP;
+		ip->i_diflags |= XFS_DIFLAG_NODUMP;
 	if ((pip->i_diflags & XFS_DIFLAG_SYNC) &&
 	    xfs_inherit_sync)
-		di_flags |= XFS_DIFLAG_SYNC;
+		ip->i_diflags |= XFS_DIFLAG_SYNC;
 	if ((pip->i_diflags & XFS_DIFLAG_NOSYMLINKS) &&
 	    xfs_inherit_nosymlinks)
-		di_flags |= XFS_DIFLAG_NOSYMLINKS;
+		ip->i_diflags |= XFS_DIFLAG_NOSYMLINKS;
 	if ((pip->i_diflags & XFS_DIFLAG_NODEFRAG) &&
 	    xfs_inherit_nodefrag)
-		di_flags |= XFS_DIFLAG_NODEFRAG;
+		ip->i_diflags |= XFS_DIFLAG_NODEFRAG;
 	if (pip->i_diflags & XFS_DIFLAG_FILESTREAM)
-		di_flags |= XFS_DIFLAG_FILESTREAM;
+		ip->i_diflags |= XFS_DIFLAG_FILESTREAM;
 
-	ip->i_diflags |= di_flags;
+	/*
+	 * Make sure the extsize actually validates properly.  See the
+	 * comments in xfs_inode_validate_extsize for details.
+	 */
+	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
+			VFS_I(ip)->i_mode, ip->i_diflags);
+	if (failaddr) {
+		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
+				   XFS_DIFLAG_EXTSZINHERIT);
+		ip->i_extsize = 0;
+	}
 }
 
 /* Propagate di_flags2 from a parent inode to a child inode. */
@@ -737,12 +747,22 @@  xfs_inode_inherit_flags2(
 	struct xfs_inode	*ip,
 	const struct xfs_inode	*pip)
 {
+	xfs_failaddr_t		failaddr;
+
 	if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) {
 		ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE;
 		ip->i_cowextsize = pip->i_cowextsize;
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+
+	/* Make sure the cowextsize actually validates properly. */
+	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
+			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
+	if (failaddr) {
+		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		ip->i_cowextsize = 0;
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6407921aca96..0d1d5e32ab9d 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1291,6 +1291,20 @@  xfs_ioctl_setattr_check_extsize(
 
 	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
 
+	/*
+	 * Prevent the sysadmin from misconfiguring files to have an invalid
+	 * extent size hint set if rtinherit and extszinherit are set.  See the
+	 * comments in xfs_inode_validate_extsize for details.
+	 */
+	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
+	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
+		unsigned int	rtextsize_bytes;
+
+		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
+		if (fa->fsx_extsize % rtextsize_bytes)
+			return -EINVAL;
+	}
+
 	failaddr = xfs_inode_validate_extsize(ip->i_mount,
 			XFS_B_TO_FSB(mp, fa->fsx_extsize),
 			VFS_I(ip)->i_mode, new_diflags);