From patchwork Mon Aug 26 03:10:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 13777031 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A300329402; Mon, 26 Aug 2024 03:15:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724642105; cv=none; b=UExXDLWGeM7DcFT+kKTb7pg6eykiZSnVlcoSobM1yib514vyO1ovlBZKObYZzBpbzfawR5Pao2T3R+6UyiuzteHQ5A3RxjOdcZids5EYZlnbaWNE8mCIHISl+niSaqZu4kG0rEX8UVzWjTpW93j72ZXAN/BfXmA1pqhvXZ1NSyc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724642105; c=relaxed/simple; bh=UOnEWI2vYehTIUFlA0YPaT6gpPjpP1D3Tu02J2koLME=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=slWbkzd94ScwcBeg/lY1jbsoBnV6/YC/iDE54Y/z2o/RV2fdvYJxLCaTn5m6oOfh4BxL5QchSWyKX3cU/HKh7ZjMkFmCnl4eGSLhwHfLgzJygjgYZ1RYZ8eC2/TGhENeAUJWA28f8a2xk2fHwYk9Izj5lXQ9lannUfaDswB+Y2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4WsbMt35R1zhYRj; Mon, 26 Aug 2024 11:12:58 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 36C811800D0; Mon, 26 Aug 2024 11:15:01 +0800 (CST) Received: from localhost.localdomain (10.175.104.67) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 26 Aug 2024 11:15:00 +0800 From: Zizhi Wo To: , , , , CC: , , , Subject: [PATCH 1/2] xfs: Fix missing block calculations in xfs datadev fsmap Date: Mon, 26 Aug 2024 11:10:04 +0800 Message-ID: <20240826031005.2493150-2-wozizhi@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240826031005.2493150-1-wozizhi@huawei.com> References: <20240826031005.2493150-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemf100017.china.huawei.com (7.202.181.16) In xfs datadev fsmap query, I noticed a missing block calculation problem: [root@fedora ~]# xfs_db -r -c "sb 0" -c "p" /dev/vdb magicnum = 0x58465342 blocksize = 4096 dblocks = 5242880 ...... [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt ... 30: 253:16 [31457384..41943031]: free space 3 (104..10485751) 10485648 (41943031 + 1) / 8 = 5242879 != 5242880 We missed one block in our fsmap calculation! The root cause of the problem lies in __xfs_getfsmap_datadev(), where the calculation of "end_fsb" requires a classification discussion. If "end_fsb" is calculated based on "eofs", we need to add an extra sentinel node for subsequent length calculations. Otherwise, one block will be missed. If "end_fsb" is calculated based on "keys[1]", then there is no need to add an extra node. Because "keys[1]" itself is unreachable, it cancels out one of the additions. The diagram below illustrates this: |0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|-----eofs |---------------|---------------------| a n b n+1 c Assume that eofs is 16, the start address of the previous query is block n, sector 0, and the length is 1, so the "info->next" is at point b, sector 8. In the last query, suppose the "rm_startblock" calculated based on "eofs - 1" is the last block n+1 at point b. All we get is the starting address of the block, not the end. Therefore, an additional sentinel node needs to be added to move it to point c. After that, subtracting one from the other will yield the remaining 1. Although we can now calculate the exact last query using "info->end_daddr", we will still get an incorrect value if the device at this point is not the boundary device specified by "keys[1]", as "end_daddr" is still the initial value. Therefore, the eofs situation here needs to be corrected. The issue is resolved by adding a sentinel node. Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo --- fs/xfs/xfs_fsmap.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 85dbb46452ca..8a2dfe96dae7 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -596,12 +596,27 @@ __xfs_getfsmap_datadev( xfs_agnumber_t end_ag; uint64_t eofs; int error = 0; + int sentinel = 0; eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks); if (keys[0].fmr_physical >= eofs) return 0; start_fsb = XFS_DADDR_TO_FSB(mp, keys[0].fmr_physical); - end_fsb = XFS_DADDR_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + /* + * For the case of eofs, we need to add a sentinel node; + * otherwise, one block will be missed when calculating the length + * in the last query. + * For the case of key[1], there is no need to add a sentinel node + * because it already represents a value that cannot be reached. + * For the case where key[1] after shifting is within the same + * block as the starting address, it is resolved using end_daddr. + */ + if (keys[1].fmr_physical > eofs - 1) { + sentinel = 1; + end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1); + } else { + end_fsb = XFS_DADDR_TO_FSB(mp, keys[1].fmr_physical); + } /* * Convert the fsmap low/high keys to AG based keys. Initialize @@ -649,7 +664,7 @@ __xfs_getfsmap_datadev( info->pag = pag; if (pag->pag_agno == end_ag) { info->high.rm_startblock = XFS_FSB_TO_AGBNO(mp, - end_fsb); + end_fsb) + sentinel; info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); From patchwork Mon Aug 26 03:10:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 13777032 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9740C3BBE0; Mon, 26 Aug 2024 03:15:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724642108; cv=none; b=X57smyiUVIxWNT9zZziL3Cb4qFbJlxzRJNmVRJ6tkBAr7FIRjJasQrwvrrQCESOEtXbNswDgXWEtfpER8bhHkzRPDh4x+odgmj/dCWRzPNENxvJWpsp+gtoaF7Xaxq7083LG78/H5Hzy22nI96aKh5EUUm039ImZFRrcBe94lNM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724642108; c=relaxed/simple; bh=DuYpXsvG9S0qrtyQKMASfZ/A2YRxAXluE+7qk7APOr4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MLcQZztemzLrX374y4suUNadFoz0bplpDTrZXWakqq0h4jl5C2eWbV0qNk96ozhVg12ds0bKhSjaDC7xxofXmM89abydXkgrRCYIjDw24kheEEmUKzxgwvvXhQGmipVNEsipwxcIlEVa+g4WM9an0YAIetT5pjvqLLy82s5TeCM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4WsbQ55pMRz1j7DN; Mon, 26 Aug 2024 11:14:53 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id EC4671A0188; Mon, 26 Aug 2024 11:15:01 +0800 (CST) Received: from localhost.localdomain (10.175.104.67) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 26 Aug 2024 11:15:01 +0800 From: Zizhi Wo To: , , , , CC: , , , Subject: [PATCH 2/2] xfs: Fix incorrect parameter calculation in rt fsmap Date: Mon, 26 Aug 2024 11:10:05 +0800 Message-ID: <20240826031005.2493150-3-wozizhi@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240826031005.2493150-1-wozizhi@huawei.com> References: <20240826031005.2493150-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemf100017.china.huawei.com (7.202.181.16) I noticed a bug related to xfs realtime device fsmap: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -r' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:48 [0..7]: unknown 8 1: 253:48 [8..1048575]: free space 1048568 2: 253:48 [1048576..1050623]: unknown 2048 3: 253:48 [1050624..2097151]: free space 1046528 Bug: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -r 1050621 1050621' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:48 [1050621..1050623]: unknown 3 1: 253:48 [1050624..1050631]: free space 8 Normally, we should not get any results, but we do get two queries. The root cause of this problem lies in the calculation of "end_rtb" in xfs_getfsmap_rtdev_rtbitmap(), which uses XFS_BB_TO_FSB method (round up). However, in the subsequent call to xfs_rtalloc_query_range(), "high_rec" calculated based on "end_rtb" has a semantic meaning of being reachable within the loop. The first round of the loop in xfs_rtalloc_query_range() doesn't find any free extents. But after incrementing "rtstart" by 1, start still does not exceed "high_key", and the second round of the loop entered. It finds free extent and obtains the first unknown extent by subtracting it from "info->next_daddr". Even though we can accurately handle it through "info->end_daddr", two incorrect extents has already been returned before the last query. The main call stack is as follows: xfs_getfsmap_rtdev_rtbitmap // rounded up end_rtb = XFS_BB_TO_FSB(..., keys[1].fmr_physical) ahigh.ar_startext = xfs_rtb_to_rtxup(mp, end_rtb) xfs_rtalloc_query_range // high_key is calculated based on end_rtb high_key = min(high_rec->ar_startext, ...) while (rtstart <= high_key) // First loop, doesn't find free extent xfs_rtcheck_range rtstart = rtend + 1 // Second loop, the free extent outside the query interval is found xfs_getfsmap_rtdev_rtbitmap_helper // unknown and free were printed out together in the second round xfs_getfsmap_helper The issue is resolved by adjusting the relevant calculations. Both the loop exit condition in the xfs_rtalloc_query_range() and the length calculation condition (high_key - start + 1) in the xfs_rtfind_forw() reflect the open interval semantics of "high_key". Therefore, when calculating "end_rtb", XFS_BB_TO_FSBT is used. In addition, in order to satisfy the close interval semantics, "key[1].fmr_physical" needs to be decremented by 1. For the non-eofs case, there is no need to worry about over-counting because we can accurately count the block number through "info->end_daddr". After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -r 1050621 1050621' /mnt [root@fedora ~]# Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap") Signed-off-by: Zizhi Wo --- fs/xfs/libxfs/xfs_rtbitmap.c | 4 +--- fs/xfs/xfs_fsmap.c | 20 +++++++++++++++++--- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 386b672c5058..7af4e7afda7d 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1034,8 +1034,7 @@ xfs_rtalloc_query_range( if (low_rec->ar_startext > high_rec->ar_startext) return -EINVAL; - if (low_rec->ar_startext >= mp->m_sb.sb_rextents || - low_rec->ar_startext == high_rec->ar_startext) + if (low_rec->ar_startext >= mp->m_sb.sb_rextents) return 0; high_key = min(high_rec->ar_startext, mp->m_sb.sb_rextents - 1); @@ -1057,7 +1056,6 @@ xfs_rtalloc_query_range( if (is_free) { rec.ar_startext = rtstart; rec.ar_extcount = rtend - rtstart + 1; - error = fn(mp, tp, &rec, priv); if (error) break; diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 8a2dfe96dae7..42c4b94b0493 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -515,11 +515,20 @@ xfs_getfsmap_rtdev_rtbitmap( int error; eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); - if (keys[0].fmr_physical >= eofs) + if (keys[0].fmr_physical >= eofs || + keys[0].fmr_physical == keys[1].fmr_physical) return 0; start_rtb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical + keys[0].fmr_length); - end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + /* + * The passed keys[1] is an unreachable value, while "end_rtb" is used + * to calculate "ahigh.ar_startext", serving as an input parameter for + * xfs_rtalloc_query_range(), which is a value that can be reached. + * Therefore, it is necessary to use "keys[1].fmr_physical - 1" here. + * And because of the semantics of "end_rtb", it needs to be + * supplemented by 1 in the last calculation. + */ + end_rtb = XFS_BB_TO_FSBT(mp, min(eofs - 1, keys[1].fmr_physical - 1)); info->missing_owner = XFS_FMR_OWN_UNKNOWN; @@ -549,9 +558,14 @@ xfs_getfsmap_rtdev_rtbitmap( /* * Report any gaps at the end of the rtbitmap by simulating a null * rmap starting at the block after the end of the query range. + * For the boundary case of eofs, we need to increment the count + * by 1 to prevent omission in block statistics. + * For the boundary case of non-eofs, even if incrementing by 1 + * may lead to over-counting, it doesn't matter because it is + * handled by "info->end_daddr" in this situation, not "ahigh". */ info->last = true; - ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext); + ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext + 1); error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info); if (error)