[V3] xfs: make sure sb_fdblocks is non-negative

Message ID	20240603183011.2690-1-wen.gang.wang@oracle.com (mailing list archive)
State	Superseded
Headers	show Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29DBE137914 for <linux-xfs@vger.kernel.org>; Mon, 3 Jun 2024 18:30:20 +0000 (UTC) From: Wengang Wang <wen.gang.wang@oracle.com> To: linux-xfs@vger.kernel.org Cc: wen.gang.wang@oracle.com, djwong@kernel.org, hch@lst.de Subject: [PATCH V3] xfs: make sure sb_fdblocks is non-negative Date: Mon, 3 Jun 2024 11:30:11 -0700 Message-Id: <20240603183011.2690-1-wen.gang.wang@oracle.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[V3] xfs: make sure sb_fdblocks is non-negative \| expand [V3] xfs: make sure sb_fdblocks is non-negative

Message ID

20240603183011.2690-1-wen.gang.wang@oracle.com (mailing list archive)

State

Superseded

Headers

From: Wengang Wang <wen.gang.wang@oracle.com>
To: linux-xfs@vger.kernel.org
Cc: wen.gang.wang@oracle.com, djwong@kernel.org, hch@lst.de
Subject: [PATCH V3] xfs: make sure sb_fdblocks is non-negative
Date: Mon,  3 Jun 2024 11:30:11 -0700
Message-Id: <20240603183011.2690-1-wen.gang.wang@oracle.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[V3] xfs: make sure sb_fdblocks is non-negative | expand

Commit Message

Wengang Wang June 3, 2024, 6:30 p.m. UTC

A user with a completely full filesystem experienced an unexpected
shutdown when the filesystem tried to write the superblock during
runtime.
kernel shows the following dmesg:

[    8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0
[    8.177417] XFS (dm-4): Unmount and run xfs_repair
[    8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
[    8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
[    8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[    8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3  .....&E)....;...
[    8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
[    8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
[    8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
[    8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
[    8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
[    8.182008] XFS (dm-4): Corruption of in-memory data detected.  Shutting down filesystem
[    8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)

When xfs_log_sb writes super block to disk, b_fdblocks is fetched from
m_fdblocks without any lock. As m_fdblocks can experience a positive -> negative
 -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks)
So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is
type of unsigned long long, it reads super big. And sb_fdblocks being bigger
than sb_dblocks is a problem during log recovery, xfs_validate_sb_write()
complains.

Fix:
As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled,
We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is
not nenative.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
V2 -> V3: break the line to ensure it isn't overly long
V1 -> V2: add problem symptoms in patch description.
---
 fs/xfs/libxfs/xfs_sb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Dave Chinner June 4, 2024, 11:08 p.m. UTC | #1

On Mon, Jun 03, 2024 at 11:30:11AM -0700, Wengang Wang wrote:
> A user with a completely full filesystem experienced an unexpected
> shutdown when the filesystem tried to write the superblock during
> runtime.
> kernel shows the following dmesg:
> 
> [    8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0
> [    8.177417] XFS (dm-4): Unmount and run xfs_repair
> [    8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
> [    8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
> [    8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [    8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3  .....&E)....;...
> [    8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
> [    8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
> [    8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
> [    8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
> [    8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
> [    8.182008] XFS (dm-4): Corruption of in-memory data detected.  Shutting down filesystem
> [    8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)
> 
> When xfs_log_sb writes super block to disk, b_fdblocks is fetched from
> m_fdblocks without any lock. As m_fdblocks can experience a positive -> negative
>  -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks)
> So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is
> type of unsigned long long, it reads super big. And sb_fdblocks being bigger
> than sb_dblocks is a problem during log recovery, xfs_validate_sb_write()
> complains.

As I explained in the previous review thread, this "summing can be
transiently negative" behaviour is "native" to percpu counters. i.e.
percpu_counter_sum() does not require the xfs_mod_fdblocks()
behaviour to return negative values because the sum's guaranteed
accuracy is only +/-(batch size * nrcpus).

Hence all the percpu_counter_sum() counter calls in xfs_log_sb()
need to use percpu_counter_sum_positive() to avoid logging transient
engative values to the journal, not just the mp->m_fdblocks counter.

-Dave.

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 09e4bf949bf8..252bfa9a9fdb 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1042,7 +1042,8 @@  xfs_log_sb(
 		mp->m_sb.sb_ifree = min_t(uint64_t,
 				percpu_counter_sum(&mp->m_ifree),
 				mp->m_sb.sb_icount);
-		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+		mp->m_sb.sb_fdblocks =
+				percpu_counter_sum_positive(&mp->m_fdblocks);
 	}
 
 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);

[V3] xfs: make sure sb_fdblocks is non-negative

Commit Message

Comments

Patch