Message ID | 20231110044500.718022-3-david@fromorbit.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: fix recovery corruption on s390 w/ nrext64 | expand |
On Fri, Nov 10, 2023 at 03:33:14PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > zero padding in the inode, except when NREXT64=1 configurations are > in use and the zero padding is no longer padding but holds the 64 > bit extent counter. > > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > machines. Fix it by only clearing di_flushiter on v2 inodes during > recovery. > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > cc: stable@kernel.org # 5.19+ > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..dbdab4ce7c44 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > * superblock flag to determine whether we need to look at di_flushiter > * to skip replay when the on disk inode is newer than the log one > */ > - if (!xfs_has_v3inodes(mp) && > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > - /* > - * Deal with the wrap case, DI_MAX_FLUSH is less > - * than smaller numbers > - */ > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > - /* do nothing */ > - } else { > - trace_xfs_log_recover_inode_skip(log, in_f); > - error = 0; > - goto out_release; > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } > } > + > + /* Take the opportunity to reset the flush iteration count */ > + ldip->di_flushiter = 0; Hmm. Well this fixes the zeroday problem, so thank you for getting the root of this! Reviewed-by: Darrick J. Wong <djwong@kernel.org> Though hch did suggest reducing the amount of indenting here by compressing the if tests together. I can't decide if it's worth rearranging that old V4 code since none of it's scheduled for removal until 2030, but it /is/ legacy code that maybe we just don't care to touch? <shrug> --D > } > > - /* Take the opportunity to reset the flush iteration count */ > - ldip->di_flushiter = 0; > > if (unlikely(S_ISREG(ldip->di_mode))) { > if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && > -- > 2.42.0 >
On Fri, Nov 10, 2023 at 11:32:55AM -0800, Darrick J. Wong wrote: > On Fri, Nov 10, 2023 at 03:33:14PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > > zero padding in the inode, except when NREXT64=1 configurations are > > in use and the zero padding is no longer padding but holds the 64 > > bit extent counter. > > > > This manifests obviously on big endian platforms (e.g. s390) because > > the log dinode is in host order and the overlap is the LSBs of the > > extent count field. It is not noticed on little endian machines > > because the overlap is at the MSB end of the extent count field and > > we need to get more than 2^^48 extents in the inode before it > > manifests. i.e. the heat death of the universe will occur before we > > see the problem in little endian machines. > > > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > > machines. Fix it by only clearing di_flushiter on v2 inodes during > > recovery. > > > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > > cc: stable@kernel.org # 5.19+ > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > > 1 file changed, 17 insertions(+), 15 deletions(-) > > > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > > index f4c31c2b60d5..dbdab4ce7c44 100644 > > --- a/fs/xfs/xfs_inode_item_recover.c > > +++ b/fs/xfs/xfs_inode_item_recover.c > > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > > * superblock flag to determine whether we need to look at di_flushiter > > * to skip replay when the on disk inode is newer than the log one > > */ > > - if (!xfs_has_v3inodes(mp) && > > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > > - /* > > - * Deal with the wrap case, DI_MAX_FLUSH is less > > - * than smaller numbers > > - */ > > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > > - /* do nothing */ > > - } else { > > - trace_xfs_log_recover_inode_skip(log, in_f); > > - error = 0; > > - goto out_release; > > + if (!xfs_has_v3inodes(mp)) { > > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > > + /* > > + * Deal with the wrap case, DI_MAX_FLUSH is less > > + * than smaller numbers > > + */ > > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > > + /* do nothing */ > > + } else { > > + trace_xfs_log_recover_inode_skip(log, in_f); > > + error = 0; > > + goto out_release; > > + } > > } > > + > > + /* Take the opportunity to reset the flush iteration count */ > > + ldip->di_flushiter = 0; > > Hmm. Well this fixes the zeroday problem, so thank you for getting the > root of this! > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > Though hch did suggest reducing the amount of indenting here by > compressing the if tests together. I can't decide if it's worth > rearranging that old V4 code since none of it's scheduled for removal > until 2030, but it /is/ legacy code that maybe we just don't care to > touch? I'd prefer not to touch it. It's now all isolated in a "!xfs_has_v3inodes()" branch, so I think we can largely ignore the grot for now. If we have to do a larger rework of this code in future, then we can look at reworking it. But right now, I really don't feel like risking breaking something else by doing unnecessary cleanups on code we haven't needed to touch in over a decade. -Dave.
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index f4c31c2b60d5..dbdab4ce7c44 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( * superblock flag to determine whether we need to look at di_flushiter * to skip replay when the on disk inode is newer than the log one */ - if (!xfs_has_v3inodes(mp) && - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { - /* - * Deal with the wrap case, DI_MAX_FLUSH is less - * than smaller numbers - */ - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { - /* do nothing */ - } else { - trace_xfs_log_recover_inode_skip(log, in_f); - error = 0; - goto out_release; + if (!xfs_has_v3inodes(mp)) { + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { + /* + * Deal with the wrap case, DI_MAX_FLUSH is less + * than smaller numbers + */ + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { + /* do nothing */ + } else { + trace_xfs_log_recover_inode_skip(log, in_f); + error = 0; + goto out_release; + } } + + /* Take the opportunity to reset the flush iteration count */ + ldip->di_flushiter = 0; } - /* Take the opportunity to reset the flush iteration count */ - ldip->di_flushiter = 0; if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&