Message ID | 20240511003426.13858-1-wen.gang.wang@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: make sure sb_fdblocks is non-negative | expand |
On Fri, May 10, 2024 at 05:34:26PM -0700, Wengang Wang wrote: > when writting super block to disk (in xfs_log_sb), sb_fdblocks is fetched from > m_fdblocks without any lock. As m_fdblocks can experience a positive -> negativ > -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks) > So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is > type of unsigned long long, it reads super big. And sb_fdblocks being bigger > than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() > complains. > > Fix: > As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled, > We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is > not genative. Ok, I have no problems with the change being made, but I'm unclear on why we care if these values get logged as large positive numbers? The comment above this code explains that these counts are known to be inaccurate and so are not trusted. i.e. they will be corrected post-log recovery if they are recovered from the log: * Lazy sb counters don't update the in-core superblock so do that now. * If this is at unmount, the counters will be exactly correct, but at * any other time they will only be ballpark correct because of * reservations that have been taken out percpu counters. If we have an * unclean shutdown, this will be corrected by log recovery rebuilding * the counters from the AGF block counts. IOWs journal recovery doesn't actually care what these values are, so what actually goes wrong if this sum returns a negative value? Cheers, Dave.
Hi Dave, Please see inlines, > On May 10, 2024, at 6:17 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Fri, May 10, 2024 at 05:34:26PM -0700, Wengang Wang wrote: >> when writting super block to disk (in xfs_log_sb), sb_fdblocks is fetched from >> m_fdblocks without any lock. As m_fdblocks can experience a positive -> negativ >> -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks) >> So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is >> type of unsigned long long, it reads super big. And sb_fdblocks being bigger >> than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() >> complains. >> >> Fix: >> As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled, >> We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is >> not genative. > > Ok, I have no problems with the change being made, but I'm unclear > on why we care if these values get logged as large positive numbers? > > The comment above this code explains that these counts are known to > be inaccurate and so are not trusted. i.e. they will be corrected > post-log recovery if they are recovered from the log: > > * Lazy sb counters don't update the in-core superblock so do that now. > * If this is at unmount, the counters will be exactly correct, but at > * any other time they will only be ballpark correct because of > * reservations that have been taken out percpu counters. If we have an > * unclean shutdown, this will be corrected by log recovery rebuilding > * the counters from the AGF block counts. > Things is that we have a metadump, looking at the fdblocks from super block 0, it is good. $ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount" dblocks = 26214400 icount = 512 ifree = 337 fdblocks = 25997100 And when looking at the log, we have the following: $ egrep -a "fdblocks|icount|ifree" cust.log |tail sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 18446744073709551604 # cust.log is output of my script which tries to parse the log buffer. 18446744073709551604ULL == 0xfffffffffffffff4 or -12LL With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump, I got the following in dmesg: [ 52.927796] XFS (loop0): SB summary counter sanity check failed [ 52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0 [ 52.930890] XFS (loop0): Unmount and run xfs_repair [ 52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer: [ 52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............ [ 52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42 ......F....WJ..B [ 52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................ [ 52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ [ 52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d.......... [ 52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d............. [ 52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................ [ 52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559). Shutting down filesystem. [ 52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s) [ 52.948099] XFS (loop0): log mount/recovery failed: error -117 [ 52.949810] XFS (loop0): log mount failed Looking at corresponding code: 231 xfs_validate_sb_write( 232 struct xfs_mount *mp, 233 struct xfs_buf *bp, 234 struct xfs_sb *sbp) 235 { 236 /* 237 * Carry out additional sb summary counter sanity checks when we write 238 * the superblock. We skip this in the read validator because there 239 * could be newer superblocks in the log and if the values are garbage 240 * even after replay we'll recalculate them at the end of log mount. 241 * 242 * mkfs has traditionally written zeroed counters to inprogress and 243 * secondary superblocks, so allow this usage to continue because 244 * we never read counters from such superblocks. 245 */ 246 if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress && 247 (sbp->sb_fdblocks > sbp->sb_dblocks || 248 !xfs_verify_icount(mp, sbp->sb_icount) || 249 sbp->sb_ifree > sbp->sb_icount)) { 250 xfs_warn(mp, "SB summary counter sanity check failed"); 251 return -EFSCORRUPTED; 252 } From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks. Looking at the super block dump and log dump, We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log. # I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3) So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though Should be may re-calculated from AGs. The fix aims to make xfs_validate_sb_write() happy. Thanks, Wengang > IOWs journal recovery doesn't actually care what these values are, > so what actually goes wrong if this sum returns a negative value? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 73a4b895de67..199756970383 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1037,7 +1037,7 @@ xfs_log_sb( mp->m_sb.sb_ifree = min_t(uint64_t, percpu_counter_sum(&mp->m_ifree), mp->m_sb.sb_icount); - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); + mp->m_sb.sb_fdblocks = percpu_counter_sum_positive(&mp->m_fdblocks); } xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
when writting super block to disk (in xfs_log_sb), sb_fdblocks is fetched from m_fdblocks without any lock. As m_fdblocks can experience a positive -> negativ -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks) So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is type of unsigned long long, it reads super big. And sb_fdblocks being bigger than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() complains. Fix: As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled, We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is not genative. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> --- fs/xfs/libxfs/xfs_sb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)