Message ID | 162152894725.2694219.2966158387963381824.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: strengthen validation of extent size hints | expand |
On Thu, May 20, 2021 at 09:42:27AM -0700, Darrick J. Wong wrote: > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index 045118c7bf78..dce267dbea5f 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -589,8 +589,21 @@ xfs_inode_validate_extsize( > inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); > extsize_bytes = XFS_FSB_TO_B(mp, extsize); > > + /* > + * Historically, XFS didn't check that the extent size hint was an > + * integer multiple of the rt extent size on a directory with both > + * rtinherit and extszinherit flags set. This results in math errors > + * in the rt allocator and inode verifier errors when the unaligned > + * hint value propagates into new realtime files. Since there might > + * be filesystems in the wild, the best we can do for now is to > + * mitigate the harms by stopping the propagation. > + * > + * The next time we add a new inode feature, the if test below should > + * also trigger if that new feature is enabled and (rtinherit_flag && > + * inherit_flag). > + */ I don't understand how this comment relates to the change below, and in fact I don't understand the last paragraph at all. > if (rt_flag) > - blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog; > + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); This just looks like a cleanup, that is replace the open coded version of XFS_FSB_TO_B with the actual helper. > + /* > + * Clear invalid extent size hints set on files with rtinherit and > + * extszinherit set. See the comments in xfs_inode_validate_extsize > + * for details. > + */ Maybe that comment should be here as it makes a whole lot more sense where we do the actual clearing. > + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && > + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && > + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { > + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT); Overly long line. > + xfs_failaddr_t failaddr; > umode_t mode = VFS_I(ip)->i_mode; > > if (S_ISDIR(mode)) { > if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) > - di_flags |= XFS_DIFLAG_RTINHERIT; > + ip->i_diflags |= XFS_DIFLAG_RTINHERIT; The removal of the di_flags seems like an unrelated (though nice) cleanup.
On Fri, May 21, 2021 at 08:49:03AM +0100, Christoph Hellwig wrote: > On Thu, May 20, 2021 at 09:42:27AM -0700, Darrick J. Wong wrote: > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > > index 045118c7bf78..dce267dbea5f 100644 > > --- a/fs/xfs/libxfs/xfs_inode_buf.c > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > > @@ -589,8 +589,21 @@ xfs_inode_validate_extsize( > > inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); > > extsize_bytes = XFS_FSB_TO_B(mp, extsize); > > > > + /* > > + * Historically, XFS didn't check that the extent size hint was an > > + * integer multiple of the rt extent size on a directory with both > > + * rtinherit and extszinherit flags set. This results in math errors > > + * in the rt allocator and inode verifier errors when the unaligned > > + * hint value propagates into new realtime files. Since there might > > + * be filesystems in the wild, the best we can do for now is to > > + * mitigate the harms by stopping the propagation. > > + * > > + * The next time we add a new inode feature, the if test below should > > + * also trigger if that new feature is enabled and (rtinherit_flag && > > + * inherit_flag). > > + */ > > I don't understand how this comment relates to the change below, and > in fact I don't understand the last paragraph at all. It doesn't relate to the cleanup below at all. I put a comment there to explain why this verifier function doesn't check the rextsize alignment of the hint. In other words, it's a comment describing a gap in a validation function and why we can't remove the gap here but have to sprinkle mitigations everywhere else. Earlier iterations of this patch simply fixed the verifier and allowed existing filesystems to fail. Brian and I weren't totally comfortable with introducing a breaking change like that even though the consequences of the bad hint actually propagating into a realtime file are immediate filesystem corruption shutdowns, so this iteration mitigates the propagation issue by fixing directories when they get rewritten and preventing propagation of bad hints into new files. I don't want to introduce a new feature/inode flag just to say "corrected rt extent size hint" since having an extent size hint set on a filesystem with a realtime extent size larger than 1 fsb on a filesystem with a realtime volume configured at all is a vanishingly rare condition. Granted, we could decide to introduce the breaking verifier change and say "fmeh to this corner case, if you did that you're screwed anyway", but I think I want a few more ACKs on /that/ strategy before I do that. > > if (rt_flag) > > - blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog; > > + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); > > This just looks like a cleanup, that is replace the open coded version > of XFS_FSB_TO_B with the actual helper. Yes. As I mentioned above, the comment documents code that isn't there and cannot be added. > > + /* > > + * Clear invalid extent size hints set on files with rtinherit and > > + * extszinherit set. See the comments in xfs_inode_validate_extsize > > + * for details. > > + */ > > Maybe that comment should be here as it makes a whole lot more sense > where we do the actual clearing. Ok. > > > + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && > > + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && > > + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { > > + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT); > > Overly long line. Fixed, though Linus said to stop caring about code that goes slightly over. > > + xfs_failaddr_t failaddr; > > umode_t mode = VFS_I(ip)->i_mode; > > > > if (S_ISDIR(mode)) { > > if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) > > - di_flags |= XFS_DIFLAG_RTINHERIT; > > + ip->i_diflags |= XFS_DIFLAG_RTINHERIT; > > The removal of the di_flags seems like an unrelated (though nice) > cleanup. Ok fine I'll do this bugfix properly and turn the cleanups into new separate patches and add all three to the 15+ cleanup patches I didn't manage to get merged last cycle. --D
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 045118c7bf78..dce267dbea5f 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -589,8 +589,21 @@ xfs_inode_validate_extsize( inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); extsize_bytes = XFS_FSB_TO_B(mp, extsize); + /* + * Historically, XFS didn't check that the extent size hint was an + * integer multiple of the rt extent size on a directory with both + * rtinherit and extszinherit flags set. This results in math errors + * in the rt allocator and inode verifier errors when the unaligned + * hint value propagates into new realtime files. Since there might + * be filesystems in the wild, the best we can do for now is to + * mitigate the harms by stopping the propagation. + * + * The next time we add a new inode feature, the if test below should + * also trigger if that new feature is enabled and (rtinherit_flag && + * inherit_flag). + */ if (rt_flag) - blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog; + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); else blocksize_bytes = mp->m_sb.sb_blocksize; diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 78324e043e25..cb5e04ed2654 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -142,6 +142,19 @@ xfs_trans_log_inode( flags |= XFS_ILOG_CORE; } + /* + * Clear invalid extent size hints set on files with rtinherit and + * extszinherit set. See the comments in xfs_inode_validate_extsize + * for details. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + flags |= XFS_ILOG_CORE; + } + /* * Record the specific change for fdatasync optimisation. This allows * fdatasync to skip log forces for inodes that are only timestamp diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 0369eb22c1bb..32458f3f686c 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -689,46 +689,56 @@ xfs_inode_inherit_flags( struct xfs_inode *ip, const struct xfs_inode *pip) { - unsigned int di_flags = 0; + xfs_failaddr_t failaddr; umode_t mode = VFS_I(ip)->i_mode; if (S_ISDIR(mode)) { if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; + ip->i_diflags |= XFS_DIFLAG_RTINHERIT; if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSZINHERIT; + ip->i_diflags |= XFS_DIFLAG_EXTSZINHERIT; ip->i_extsize = pip->i_extsize; } if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; + ip->i_diflags |= XFS_DIFLAG_PROJINHERIT; } else if (S_ISREG(mode)) { if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && xfs_sb_version_hasrealtime(&ip->i_mount->m_sb)) - di_flags |= XFS_DIFLAG_REALTIME; + ip->i_diflags |= XFS_DIFLAG_REALTIME; if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSIZE; + ip->i_diflags |= XFS_DIFLAG_EXTSIZE; ip->i_extsize = pip->i_extsize; } } if ((pip->i_diflags & XFS_DIFLAG_NOATIME) && xfs_inherit_noatime) - di_flags |= XFS_DIFLAG_NOATIME; + ip->i_diflags |= XFS_DIFLAG_NOATIME; if ((pip->i_diflags & XFS_DIFLAG_NODUMP) && xfs_inherit_nodump) - di_flags |= XFS_DIFLAG_NODUMP; + ip->i_diflags |= XFS_DIFLAG_NODUMP; if ((pip->i_diflags & XFS_DIFLAG_SYNC) && xfs_inherit_sync) - di_flags |= XFS_DIFLAG_SYNC; + ip->i_diflags |= XFS_DIFLAG_SYNC; if ((pip->i_diflags & XFS_DIFLAG_NOSYMLINKS) && xfs_inherit_nosymlinks) - di_flags |= XFS_DIFLAG_NOSYMLINKS; + ip->i_diflags |= XFS_DIFLAG_NOSYMLINKS; if ((pip->i_diflags & XFS_DIFLAG_NODEFRAG) && xfs_inherit_nodefrag) - di_flags |= XFS_DIFLAG_NODEFRAG; + ip->i_diflags |= XFS_DIFLAG_NODEFRAG; if (pip->i_diflags & XFS_DIFLAG_FILESTREAM) - di_flags |= XFS_DIFLAG_FILESTREAM; + ip->i_diflags |= XFS_DIFLAG_FILESTREAM; - ip->i_diflags |= di_flags; + /* + * Make sure the extsize actually validates properly. See the + * comments in xfs_inode_validate_extsize for details. + */ + failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize, + VFS_I(ip)->i_mode, ip->i_diflags); + if (failaddr) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + } } /* Propagate di_flags2 from a parent inode to a child inode. */ @@ -737,12 +747,22 @@ xfs_inode_inherit_flags2( struct xfs_inode *ip, const struct xfs_inode *pip) { + xfs_failaddr_t failaddr; + if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; ip->i_cowextsize = pip->i_cowextsize; } if (pip->i_diflags2 & XFS_DIFLAG2_DAX) ip->i_diflags2 |= XFS_DIFLAG2_DAX; + + /* Make sure the cowextsize actually validates properly. */ + failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, + VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2); + if (failaddr) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + } } /* diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 6407921aca96..0d1d5e32ab9d 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1291,6 +1291,20 @@ xfs_ioctl_setattr_check_extsize( new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags); + /* + * Prevent the sysadmin from misconfiguring files to have an invalid + * extent size hint set if rtinherit and extszinherit are set. See the + * comments in xfs_inode_validate_extsize for details. + */ + if ((new_diflags & XFS_DIFLAG_RTINHERIT) && + (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) { + unsigned int rtextsize_bytes; + + rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + if (fa->fsx_extsize % rtextsize_bytes) + return -EINVAL; + } + failaddr = xfs_inode_validate_extsize(ip->i_mount, XFS_B_TO_FSB(mp, fa->fsx_extsize), VFS_I(ip)->i_mode, new_diflags);