Message ID | 20240603183011.2690-1-wen.gang.wang@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [V3] xfs: make sure sb_fdblocks is non-negative | expand |
On Mon, Jun 03, 2024 at 11:30:11AM -0700, Wengang Wang wrote: > A user with a completely full filesystem experienced an unexpected > shutdown when the filesystem tried to write the superblock during > runtime. > kernel shows the following dmesg: > > [ 8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0 > [ 8.177417] XFS (dm-4): Unmount and run xfs_repair > [ 8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer: > [ 8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............ > [ 8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3 .....&E)....;... > [ 8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................ > [ 8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ > [ 8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d.......... > [ 8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d............. > [ 8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................ > [ 8.182008] XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem > [ 8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s) > > When xfs_log_sb writes super block to disk, b_fdblocks is fetched from > m_fdblocks without any lock. As m_fdblocks can experience a positive -> negative > -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks) > So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is > type of unsigned long long, it reads super big. And sb_fdblocks being bigger > than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() > complains. As I explained in the previous review thread, this "summing can be transiently negative" behaviour is "native" to percpu counters. i.e. percpu_counter_sum() does not require the xfs_mod_fdblocks() behaviour to return negative values because the sum's guaranteed accuracy is only +/-(batch size * nrcpus). Hence all the percpu_counter_sum() counter calls in xfs_log_sb() need to use percpu_counter_sum_positive() to avoid logging transient engative values to the journal, not just the mp->m_fdblocks counter. -Dave.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 09e4bf949bf8..252bfa9a9fdb 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1042,7 +1042,8 @@ xfs_log_sb( mp->m_sb.sb_ifree = min_t(uint64_t, percpu_counter_sum(&mp->m_ifree), mp->m_sb.sb_icount); - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); + mp->m_sb.sb_fdblocks = + percpu_counter_sum_positive(&mp->m_fdblocks); } xfs_sb_to_disk(bp->b_addr, &mp->m_sb);