From patchwork Wed May 29 09:51:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677646 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F21715AAD3; Wed, 29 May 2024 01:59:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; cv=none; b=oM6G1pfP3iSuyDc2xbEmJ4fUx0Qs04Hfhh/GJzp0X2vCnjobOsMcqACPRBdrMwlnfRLnT6FC/eRYusHCj4HbsbBRDyxl46IpYEmSjDvLyrkOfAxPmQL+GwF57+dyuDZ3Piiq69c7cNs7+HVNLTi3CY9qvQGK797Nc4nr8vN2Uo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; c=relaxed/simple; bh=ZPd2UeBEpFRgj9By0wN9CnpgHq7DsoertVp62L+G5Cg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=q0KG0qT3zNogdpX7ZQ49+MXgnU7QJlE7lHnpn6pZvUZTaZKjfJ1vxPM17zEP4NpWo+VJ00+QW+1DnUZy42WksEgksgpe8GHP3gekJ/Nv+8M3zjmJ7fRJdImou8cu37WWR7O75bFe/lJHTmuNuKIjUGhUA3EbCSNYI+7J4lTI8Kg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxq3LgKz4f3m8m; Wed, 29 May 2024 09:59:11 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 7DC4E1A0187; Wed, 29 May 2024 09:59:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S5; Wed, 29 May 2024 09:59:17 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 1/8] iomap: zeroing needs to be pagecache aware Date: Wed, 29 May 2024 17:51:59 +0800 Message-Id: <20240529095206.2568162-2-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S5 X-Coremail-Antispam: 1UD129KBjvJXoWxtw4xGF48KrW7tw17Jr1rtFb_yoWxZry8pr y3KFsrCr4xJ3W7JFn3AFyjvr1Fk3s5AF47Cw13G3sav3Z5AF1rKF18Ga109rWUGFWrJr1a yr40y34j9rykAaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmlb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M2 8IrcIa0xkI8VA2jI8067AKxVWUGwA2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK 0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4 x0Y4vE2Ix0cI8IcVCY1x0267AKxVWxJr0_GcWl84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2 z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4 xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v2 6r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6I AqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64vIr41l4I8I3I0E 4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGV WUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_ Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rV WUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4U JbIYCTnIWIevJa73UjIFyTuYvjTRLmi_UUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Dave Chinner Unwritten extents can have page cache data over the range being zeroed so we can't just skip them entirely. Fix this by checking for an existing dirty folio over the unwritten range we are zeroing and only performing zeroing if the folio is already dirty. XXX: how do we detect a iomap containing a cow mapping over a hole in iomap_zero_iter()? The XFS code implies this case also needs to zero the page cache if there is data present, so trigger for page cache lookup only in iomap_zero_iter() needs to handle this case as well. Before: $ time sudo ./pwrite-trunc /mnt/scratch/foo 50000 path /mnt/scratch/foo, 50000 iters real 0m14.103s user 0m0.015s sys 0m0.020s $ sudo strace -c ./pwrite-trunc /mnt/scratch/foo 50000 path /mnt/scratch/foo, 50000 iters % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 85.90 0.847616 16 50000 ftruncate 14.01 0.138229 2 50000 pwrite64 .... After: $ time sudo ./pwrite-trunc /mnt/scratch/foo 50000 path /mnt/scratch/foo, 50000 iters real 0m0.144s user 0m0.021s sys 0m0.012s $ sudo strace -c ./pwrite-trunc /mnt/scratch/foo 50000 path /mnt/scratch/foo, 50000 iters % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 53.86 0.505964 10 50000 ftruncate 46.12 0.433251 8 50000 pwrite64 .... Yup, we get back all the performance. As for the "mmap write beyond EOF" data exposure aspect documented here: https://lore.kernel.org/linux-xfs/20221104182358.2007475-1-bfoster@redhat.com/ With this command: $ sudo xfs_io -tfc "falloc 0 1k" -c "pwrite 0 1k" \ -c "mmap 0 4k" -c "mwrite 3k 1k" -c "pwrite 32k 4k" \ -c fsync -c "pread -v 3k 32" /mnt/scratch/foo Before: wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0000 sec (34.877 MiB/sec and 35714.2857 ops/sec) wrote 4096/4096 bytes at offset 32768 4 KiB, 1 ops; 0.0000 sec (229.779 MiB/sec and 58823.5294 ops/sec) 00000c00: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX 00000c10: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX read 32/32 bytes at offset 3072 32.000000 bytes, 1 ops; 0.0000 sec (568.182 KiB/sec and 18181.8182 ops/sec After: wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0000 sec (40.690 MiB/sec and 41666.6667 ops/sec) wrote 4096/4096 bytes at offset 32768 4 KiB, 1 ops; 0.0000 sec (150.240 MiB/sec and 38461.5385 ops/sec) 00000c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ read 32/32 bytes at offset 3072 32.000000 bytes, 1 ops; 0.0000 sec (558.036 KiB/sec and 17857.1429 ops/sec) We see that this post-eof unwritten extent dirty page zeroing is working correctly. This has passed through most of fstests on a couple of test VMs without issues at the moment, so I think this approach to fixing the issue is going to be solid once we've worked out how to detect the COW-hole mapping case. Signed-off-by: Dave Chinner --- fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_iops.c | 12 +----------- 2 files changed, 41 insertions(+), 13 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 41c8f0c68ef5..a3a5d4eb7289 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -583,11 +583,23 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); * * Returns a locked reference to the folio at @pos, or an error pointer if the * folio could not be obtained. + * + * Note: when zeroing unwritten extents, we might have data in the page cache + * over an unwritten extent. In this case, we want to do a pure lookup on the + * page cache and not create a new folio as we don't need to perform zeroing on + * unwritten extents if there is no cached data over the given range. */ struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len) { fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS; + if (iter->flags & IOMAP_ZERO) { + const struct iomap *srcmap = iomap_iter_srcmap(iter); + + if (srcmap->type == IOMAP_UNWRITTEN) + fgp &= ~FGP_CREAT; + } + if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; fgp |= fgf_set_order(len); @@ -1388,7 +1400,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) loff_t written = 0; /* already zeroed? we're done. */ - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) + if (srcmap->type == IOMAP_HOLE) return length; do { @@ -1399,8 +1411,22 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) bool ret; status = iomap_write_begin(iter, pos, bytes, &folio); - if (status) + if (status) { + if (status == -ENOENT) { + /* + * Unwritten extents need to have page cache + * lookups done to determine if they have data + * over them that needs zeroing. If there is no + * data, we'll get -ENOENT returned here, so we + * can just skip over this index. + */ + WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN); + if (bytes > PAGE_SIZE - offset_in_page(pos)) + bytes = PAGE_SIZE - offset_in_page(pos); + goto loop_continue; + } return status; + } if (iter->iomap.flags & IOMAP_F_STALE) break; @@ -1408,6 +1434,17 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) if (bytes > folio_size(folio) - offset) bytes = folio_size(folio) - offset; + /* + * If the folio over an unwritten extent is clean (i.e. because + * it has been read from), then it already contains zeros. Hence + * we can just skip it. + */ + if (srcmap->type == IOMAP_UNWRITTEN && + !folio_test_dirty(folio)) { + folio_unlock(folio); + goto loop_continue; + } + folio_zero_range(folio, offset, bytes); folio_mark_accessed(folio); @@ -1416,6 +1453,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) if (WARN_ON_ONCE(!ret)) return -EIO; +loop_continue: pos += bytes; length -= bytes; written += bytes; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ff222827e550..d44508930b67 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -861,17 +861,7 @@ xfs_setattr_size( trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); error = xfs_zero_range(ip, oldsize, newsize - oldsize, &did_zeroing); - } else { - /* - * iomap won't detect a dirty page over an unwritten block (or a - * cow block over a hole) and subsequently skips zeroing the - * newly post-EOF portion of the page. Flush the new EOF to - * convert the block before the pagecache truncate. - */ - error = filemap_write_and_wait_range(inode->i_mapping, newsize, - newsize); - if (error) - return error; + } else if (newsize != oldsize) { error = xfs_truncate_page(ip, newsize, &did_zeroing); } From patchwork Wed May 29 09:52:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677651 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 620A715AAD6; Wed, 29 May 2024 01:59:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; cv=none; b=F0MNZ4PRO92o+ZN+DrrKbACemwjTZmMkjpw/7A6gTddD1FYdf9XbwTW9HHKoOyh9qHtB6n3CsveIstIB0X1fSOBTI1TbCqVv0BHxjmWrvyKCkx0ZmDmKLxWOvt+k2FQih1KwZfc9u8K147hkZkZmGNDMTRZZdtjQXA5jh+83kyg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; c=relaxed/simple; bh=HtCW9rLiVuhATKftb06fEORs43V2f6ALkTUlDurVlv4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=omRs0IZ6FF5t/3QnGePgGFTH9wYWgs36sd3zDNiuIvzsOZCeuvmqQUptH1n+ksZ25ZPevaCyxeFVyuX9ZiKUkIC/0y62rqOau+0iwXqQCM3GHiXkPHqjJkAqpN4GHIfZeqAXA4WLH9Rb3Tm3sziFjp30G04bQkLK/krY9zwkstU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxm4z7Yz4f3jHW; Wed, 29 May 2024 09:59:08 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 01E151A0572; Wed, 29 May 2024 09:59:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S6; Wed, 29 May 2024 09:59:17 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 2/8] math64: add rem_u64() to just return the remainder Date: Wed, 29 May 2024 17:52:00 +0800 Message-Id: <20240529095206.2568162-3-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S6 X-Coremail-Antispam: 1UD129KBjvJXoWxJrWfWF4ftF4kCr4fCFWrGrg_yoW8Ww47pF sxCr98GFW8GFy3Ja1IyF1jyr1YvFn7Gr47XFWagrW8u343tw409r4fJF4xtF48Jws3Aw45 GFy7GrW5WryavF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_Jryl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY 6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17 CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF 0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIx AIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2 KfnxnUUI43ZEXa7sRiXo2UUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi Add a new helper rem_u64() to only get the remainder of unsigned 64bit divide with 32bit divisor. Signed-off-by: Zhang Yi Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/math64.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/include/linux/math64.h b/include/linux/math64.h index d34def7f9a8c..618df4862091 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -3,6 +3,7 @@ #define _LINUX_MATH64_H #include +#include #include #include #include @@ -12,6 +13,20 @@ #define div64_long(x, y) div64_s64((x), (y)) #define div64_ul(x, y) div64_u64((x), (y)) +/** + * rem_u64 - remainder of unsigned 64bit divide with 32bit divisor + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * + * Return: dividend % divisor + */ +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return dividend % divisor; +} + /** * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder * @dividend: unsigned 64bit dividend @@ -86,6 +101,15 @@ static inline s64 div64_s64(s64 dividend, s64 divisor) #define div64_long(x, y) div_s64((x), (y)) #define div64_ul(x, y) div_u64((x), (y)) +#ifndef rem_u64 +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return do_div(dividend, divisor); +} +#endif + #ifndef div_u64_rem static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) { From patchwork Wed May 29 09:52:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677647 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8937F26AF3; Wed, 29 May 2024 01:59:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; cv=none; b=Szo7FiX+/k1DHzqE6Ufnkd/nLGCnlBTkTdSRxoJPHOrv5bYC6L6z3Ep2KDWFqXXq1dqcoBdQxW6icls10NC+fNXBUwwwvLycZxYg5f0XFn18Fb262r8k0Ka5Eo/rleGEqLmAN9WFXqOzI8z2etPsJmAgW2rixgaX1dfs08dEli0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; c=relaxed/simple; bh=Y5H4B2GimGqHOmxdbFuWXZ+OG9CyjoKYo2vqZhKSZF4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RVTng/xSbW+mu198evMtCSSQxh80nInCcApY2fmbMGYrlKUcU8kGGZExLACAJHuHo6p3y/3naBtkNYGKuCyfJwhx3TTBJpBySZKJh8nR+cPPBBhLV5xQeIG58VHeLJEx7hCXcwuH/OMNdXHViuqqQgOPnR9aSFgYg/ehIc5De7c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxl56B2z4f3nZx; Wed, 29 May 2024 09:59:07 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 7F01E1A01D2; Wed, 29 May 2024 09:59:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S7; Wed, 29 May 2024 09:59:18 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 3/8] iomap: pass blocksize to iomap_truncate_page() Date: Wed, 29 May 2024 17:52:01 +0800 Message-Id: <20240529095206.2568162-4-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S7 X-Coremail-Antispam: 1UD129KBjvJXoWxuFyrKF15GFykZr4xJF1fZwb_yoW5Aw4UpF 1qkF45Gws3Xryj9F1kWFyjvw15tF1DGr40kryfKrZxZwn2qr1xtFn2kF42yF1jqrs7uF4j qFZ8K3y8Wr15ArJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JrWl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY 6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17 CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF 0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMI IF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVF xhVjvjDU0xZFpf9x0pRLvttUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi iomap_truncate_page() always assumes the block size of the truncating inode is i_blocksize(), this is not always true for some filesystems, e.g. XFS does extent size alignment for realtime inodes. Drop this assumption and pass the block size for zeroing into iomap_truncate_page(), allow filesystems to indicate the correct block size. Suggested-by: Dave Chinner Signed-off-by: Zhang Yi --- fs/iomap/buffered-io.c | 8 ++++---- fs/xfs/xfs_iomap.c | 3 ++- include/linux/iomap.h | 4 ++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a3a5d4eb7289..30b49d6e9d48 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "trace.h" #include "../internal.h" @@ -1483,11 +1484,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_zero_range); int -iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize); /* Block boundary? Nothing to do */ if (!off) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 378342673925..32306804b01b 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1471,10 +1471,11 @@ xfs_truncate_page( bool *did_zero) { struct inode *inode = VFS_I(ip); + unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) return dax_truncate_page(inode, pos, did_zero, &xfs_dax_write_iomap_ops); - return iomap_truncate_page(inode, pos, did_zero, + return iomap_truncate_page(inode, pos, blocksize, did_zero, &xfs_buffered_write_iomap_ops); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 6fc1c858013d..d67bf86ec582 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -273,8 +273,8 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops); int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, From patchwork Wed May 29 09:52:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677648 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C791D15AAD7; Wed, 29 May 2024 01:59:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; cv=none; b=htxsbibMeARNkIUHbTKqVD4VvM96lwHE/QvdtJ8f7Rr/jCXydcEXnmUzDzmWqQctLvrNK1+7Ysy6BoGiX265lKQNcnq26mQQ+pngT8P1baXqR7rBStNoBhKrxlj3hHzecRZK+dx3HF0ItlFmMmrbI5gZVd18Nw6LE0nd0d/KQlQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947963; c=relaxed/simple; bh=afHJRbRc/fwFUeIfW0Vig/K+3WKRwJ2tZp8Ww3qk80o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TKMRjt8jpAeZiq8920ZJO1OsiW690XpDhsb35XXNS4KbkzOI3xMi2s+C6oBSzknBk5HVBGBdAGCcqU5+dBJmMVJcmDDGNUnwbWl2uuZ5TjEutDi0ytp6cR4wFIlDmoZbBloODekZB9vEVp52B6ZNYTbBxiuRx6GjhGc86MKTo1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxn5T2zz4f3jHW; Wed, 29 May 2024 09:59:09 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 0F1521A058E; Wed, 29 May 2024 09:59:19 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S8; Wed, 29 May 2024 09:59:18 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 4/8] fsdax: pass blocksize to dax_truncate_page() Date: Wed, 29 May 2024 17:52:02 +0800 Message-Id: <20240529095206.2568162-5-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S8 X-Coremail-Antispam: 1UD129KBjvJXoWxuFyrKF15GF1kGr4kGw1rWFg_yoW5urWDpF 1DCa15G397Xryj9F1kWF12vw45t3WDCF409ryxArZ3Zr9Fqr1IyF1qkF1YkF4UKr48u3yj qF98Kr47Gr15AFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1x MIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBI daVFxhVjvjDU0xZFpf9x0pRHa0PUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi dax_truncate_page() always assumes the block size of the truncating inode is i_blocksize(), this is not always true for some filesystems, e.g. XFS does extent size alignment for realtime inodes. Drop this assumption and pass the block size for zeroing into dax_truncate_page(), allow filesystems to indicate the correct block size. Suggested-by: Dave Chinner Signed-off-by: Zhang Yi --- fs/dax.c | 8 ++++---- fs/ext2/inode.c | 4 ++-- fs/xfs/xfs_iomap.c | 2 +- include/linux/dax.h | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index becb4a6920c6..4cbd94fd96ed 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -1403,11 +1404,10 @@ int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, } EXPORT_SYMBOL_GPL(dax_zero_range); -int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +int dax_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize); /* Block boundary? Nothing to do */ if (!off) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 0caa1650cee8..337349c94adf 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1276,8 +1276,8 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) inode_dio_wait(inode); if (IS_DAX(inode)) - error = dax_truncate_page(inode, newsize, NULL, - &ext2_iomap_ops); + error = dax_truncate_page(inode, newsize, i_blocksize(inode), + NULL, &ext2_iomap_ops); else error = block_truncate_page(inode->i_mapping, newsize, ext2_get_block); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 32306804b01b..8cdfcbb5baa7 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1474,7 +1474,7 @@ xfs_truncate_page( unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) - return dax_truncate_page(inode, pos, did_zero, + return dax_truncate_page(inode, pos, blocksize, did_zero, &xfs_dax_write_iomap_ops); return iomap_truncate_page(inode, pos, blocksize, did_zero, &xfs_buffered_write_iomap_ops); diff --git a/include/linux/dax.h b/include/linux/dax.h index 9d3e3327af4c..4aa8ef7c8fd4 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -210,8 +210,8 @@ int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int dax_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); #if IS_ENABLED(CONFIG_DAX) int dax_read_lock(void); From patchwork Wed May 29 09:52:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677650 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C549D15A4BC; Wed, 29 May 2024 01:59:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; cv=none; b=GSgUEMDBzwggw+FSBkgXJsArFauNoCyhxspDY4hWsE0uTe2Nehdc11/tD+AOvEgLkkia8nuOCxXtCk5jEwXMEnOuI2BsZQz82ncCqJPq2zP3Ov8m0xqwQ9riPo7aRy0E3v3jOfksZo66AIgnUGfT61+9Mx+y9W6SSveAlo9VJQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; c=relaxed/simple; bh=EUEzZy0qO2Yaih+zbbYXFiy2BA0aBiE0VQ67Tmmy2jQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Dt6WZRDx1XqWBKRku7RtofIF+a2lyhKc89GuR6oHC+MJMeG2BtbLDQgW5I77WRSwiwuN5XYCBgKcpDralqEg3FWRN3/NprWBtHDRSrSJONDEFlFGEexguXPcKrR0W0Yfg0Q3rdr1yXl8sptSRTVd/aRkGgYlIj1pCr45ttbs0+A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxm5VCTz4f3nb0; Wed, 29 May 2024 09:59:08 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 8BA9B1A01A7; Wed, 29 May 2024 09:59:19 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S9; Wed, 29 May 2024 09:59:19 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 5/8] xfs: refactor the truncating order Date: Wed, 29 May 2024 17:52:03 +0800 Message-Id: <20240529095206.2568162-6-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S9 X-Coremail-Antispam: 1UD129KBjvJXoW3Ar4fZw1DGr1kAFyUCF48Zwb_yoW3GrWxpF 93GasxGrs7GryUZr1kZr1jqw1rK3Z5JFW0yFyIgF97ZayDJr1IyF97tFy8trWUKrs3Ww4F gFs8GayfWwn5AaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4U MIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBI daVFxhVjvjDU0xZFpf9x0pRHa0PUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi When truncating down an inode, we call xfs_truncate_page() to zero out the tail partial block that beyond new EOF, which prevents exposing stale data. But xfs_truncate_page() always assumes the blocksize is i_blocksize(inode), it's not always true if we have a large allocation unit for a file and we should aligned to this unitsize, e.g. realtime inode should aligned to the rtextsize. Current xfs_setattr_size() can't support zeroing out a large alignment size on trucate down since the process order is wrong. We first do zero out through xfs_truncate_page(), and then update inode size through truncate_setsize() immediately. If the zeroed range is larger than a folio, the write back path would not write back zeroed pagecache beyond the EOF folio, so it doesn't write zeroes to the entire tail extent and could expose stale data after an appending write into the next aligned extent. We need to adjust the order to zero out tail aligned blocks, write back zeroed or cached data, update i_size and drop cache beyond aligned EOF block, preparing for the fix of realtime inode and supporting the upcoming forced alignment feature. Signed-off-by: Zhang Yi --- fs/xfs/xfs_iomap.c | 2 +- fs/xfs/xfs_iomap.h | 3 +- fs/xfs/xfs_iops.c | 107 ++++++++++++++++++++++++++++----------------- 3 files changed, 69 insertions(+), 43 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 8cdfcbb5baa7..0369b64cc3f4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1468,10 +1468,10 @@ int xfs_truncate_page( struct xfs_inode *ip, loff_t pos, + unsigned int blocksize, bool *did_zero) { struct inode *inode = VFS_I(ip); - unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) return dax_truncate_page(inode, pos, blocksize, did_zero, diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index 4da13440bae9..feb1610cb645 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -25,7 +25,8 @@ int xfs_bmbt_to_iomap(struct xfs_inode *ip, struct iomap *iomap, int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len, bool *did_zero); -int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, bool *did_zero); +int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, + unsigned int blocksize, bool *did_zero); static inline xfs_filblks_t xfs_aligned_fsb_count( diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index d44508930b67..d24927075022 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -812,6 +812,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; + bool write_back = false; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); ASSERT(S_ISREG(inode->i_mode)); @@ -853,30 +854,7 @@ xfs_setattr_size( * the transaction because the inode cannot be unlocked once it is a * part of the transaction. * - * Start with zeroing any data beyond EOF that we may expose on file - * extension, or zeroing out the rest of the block on a downward - * truncate. - */ - if (newsize > oldsize) { - trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); - error = xfs_zero_range(ip, oldsize, newsize - oldsize, - &did_zeroing); - } else if (newsize != oldsize) { - error = xfs_truncate_page(ip, newsize, &did_zeroing); - } - - if (error) - return error; - - /* - * We've already locked out new page faults, so now we can safely remove - * pages from the page cache knowing they won't get refaulted until we - * drop the XFS_MMAP_EXCL lock after the extent manipulations are - * complete. The truncate_setsize() call also cleans partial EOF page - * PTEs on extending truncates and hence ensures sub-page block size - * filesystems are correctly handled, too. - * - * We have to do all the page cache truncate work outside the + * And we have to do all the page cache truncate work outside the * transaction context as the "lock" order is page lock->log space * reservation as defined by extent allocation in the writeback path. * Hence a truncate can fail with ENOMEM from xfs_trans_alloc(), but @@ -884,27 +862,74 @@ xfs_setattr_size( * user visible changes). There's not much we can do about this, except * to hope that the caller sees ENOMEM and retries the truncate * operation. - * - * And we update in-core i_size and truncate page cache beyond newsize - * before writeback the [i_disk_size, newsize] range, so we're - * guaranteed not to write stale data past the new EOF on truncate down. */ - truncate_setsize(inode, newsize); + write_back = newsize > ip->i_disk_size && oldsize != ip->i_disk_size; + if (newsize < oldsize) { + unsigned int blocksize = i_blocksize(inode); - /* - * We are going to log the inode size change in this transaction so - * any previous writes that are beyond the on disk EOF and the new - * EOF that have not been written out need to be written here. If we - * do not write the data out, we expose ourselves to the null files - * problem. Note that this includes any block zeroing we did above; - * otherwise those blocks may not be zeroed after a crash. - */ - if (did_zeroing || - (newsize > ip->i_disk_size && oldsize != ip->i_disk_size)) { - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ip->i_disk_size, newsize - 1); + /* + * Zeroing out the partial EOF block and the rest of the extra + * aligned blocks on a downward truncate. + */ + error = xfs_truncate_page(ip, newsize, blocksize, &did_zeroing); if (error) return error; + + /* + * We are going to log the inode size change in this transaction + * so any previous writes that are beyond the on disk EOF and + * the new EOF that have not been written out need to be written + * here. If we do not write the data out, we expose ourselves + * to the null files problem. Note that this includes any block + * zeroing we did above; otherwise those blocks may not be + * zeroed after a crash. + */ + if (did_zeroing || write_back) { + error = filemap_write_and_wait_range(inode->i_mapping, + min_t(loff_t, ip->i_disk_size, newsize), + roundup_64(newsize, blocksize) - 1); + if (error) + return error; + } + + /* + * Updating i_size after writing back to make sure the zeroed + * blocks could been written out, and drop all the page cache + * range that beyond blocksize aligned new EOF block. + * + * We've already locked out new page faults, so now we can + * safely remove pages from the page cache knowing they won't + * get refaulted until we drop the XFS_MMAP_EXCL lock after the + * extent manipulations are complete. + */ + i_size_write(inode, newsize); + truncate_pagecache(inode, roundup_64(newsize, blocksize)); + } else { + /* + * Start with zeroing any data beyond EOF that we may expose on + * file extension. + */ + if (newsize > oldsize) { + trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); + error = xfs_zero_range(ip, oldsize, newsize - oldsize, + &did_zeroing); + if (error) + return error; + } + + /* + * The truncate_setsize() call also cleans partial EOF page + * PTEs on extending truncates and hence ensures sub-page block + * size filesystems are correctly handled, too. + */ + truncate_setsize(inode, newsize); + + if (did_zeroing || write_back) { + error = filemap_write_and_wait_range(inode->i_mapping, + ip->i_disk_size, newsize - 1); + if (error) + return error; + } } error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); From patchwork Wed May 29 09:52:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677649 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD9DF15AAD9; Wed, 29 May 2024 01:59:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; cv=none; b=hhEiKWUh/M7uFewHHWkVXBO8Fb/lEWTFGYv7vETkxU7ag5ig+7r7RjA3SmoLUuHuW2nbd/ZrOBGraSSBE+plFvS5z3/dsBhutuDZgBRnTXbiSgy2lCSei0Lvvtp68xBs12MKP66Y8eYLtAEp1AdN+8CKL2zi5K1Aa0SHmJ5Ll8M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947964; c=relaxed/simple; bh=25KBZBHt/e9frPug4/2IwbXcYndgkcYnsmEzSEbR1rE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qJAq9Fl01GIeeVOO7DoQ+XNiVbSh8eLyshCt9xzz+gYVeyySz83EN4pKQn+84vkhoHrWF9EXVvm8aHfSJqE1sloG25ElASZpiMM0PRRLumikdSjGvedSJXAUw5qTxcpDO+VqJ0djswn4nB6AjqU7buV5JOqIjxtgDL5c9l82KTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxp5d6Lz4f3jJ2; Wed, 29 May 2024 09:59:10 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 150341A0572; Wed, 29 May 2024 09:59:20 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S10; Wed, 29 May 2024 09:59:19 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 6/8] xfs: correct the truncate blocksize of realtime inode Date: Wed, 29 May 2024 17:52:04 +0800 Message-Id: <20240529095206.2568162-7-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S10 X-Coremail-Antispam: 1UD129KBjvJXoWxXw1rJr47CF1kAw4kWrWDXFb_yoW5GFW3pF Z2k3WUGrWDG340k3WxJFn0qw1UKa4kAr47Ary5Wrn7X3WDJr1fXrn2qryYgw43trs7XFn0 gFn8C3y7Z345AaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4U MIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBI daVFxhVjvjDU0xZFpf9x0pRHa0PUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi When unaligned truncating down a realtime file which sb_rextsize is bigger than one block, xfs_truncate_page() only zeros out the tail EOF block, this could expose stale data since commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")'. If we truncate file that contains a large enough written extent: |< rxext >|< rtext >| ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW ^ (new EOF) ^ old EOF Since we only zeros out the tail of the EOF block, and xfs_itruncate_extents() unmap the whole ailgned extents, it becomes this state: |< rxext >| ...WWWzWWWWWWWWWWWWW ^ new EOF Then if we do an extending write like this, the blocks in the previous tail extent becomes stale: |< rxext >| ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW ^ old EOF ^ append start ^ new EOF Fix this by zeroing out the tail allocation uint and also make sure xfs_itruncate_extents() unmap rtextsize aligned extents. Fixes: 943bc0882ceb ("iomap: don't increase i_size if it's not a write operation") Reported-by: Chandan Babu R Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com Signed-off-by: Zhang Yi --- fs/xfs/xfs_inode.c | 3 +++ fs/xfs/xfs_iops.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 58fb7a5062e1..db35167acef6 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -35,6 +35,7 @@ #include "xfs_trans_priv.h" #include "xfs_log.h" #include "xfs_bmap_btree.h" +#include "xfs_rtbitmap.h" #include "xfs_reflink.h" #include "xfs_ag.h" #include "xfs_log_priv.h" @@ -1512,6 +1513,8 @@ xfs_itruncate_extents_flags( * the page cache can't scale that far. */ first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); + if (xfs_inode_has_bigrtalloc(ip)) + first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block); if (!xfs_verify_fileoff(mp, first_unmap_block)) { WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF); return 0; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index d24927075022..ec7b7bdf8825 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -865,7 +865,7 @@ xfs_setattr_size( */ write_back = newsize > ip->i_disk_size && oldsize != ip->i_disk_size; if (newsize < oldsize) { - unsigned int blocksize = i_blocksize(inode); + unsigned int blocksize = xfs_inode_alloc_unitsize(ip); /* * Zeroing out the partial EOF block and the rest of the extra From patchwork Wed May 29 09:52:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677652 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6310615AD9A; Wed, 29 May 2024 01:59:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947965; cv=none; b=Yhe1+j65tpFB/7LS6GaJN5KwnWQ+qOCjhlKkwKMsKvhK+KpnTLjywvvjsRy5bFtzspIU/0uWOQJzyNNk/eWD1L/Ez50arpZakVj8ygkUIJgsSLX1l5v+spAhixl7PyzSqsNmWKAKhRuTC51kDL8GPvrM4J8GFcl3Jaab0XPmiRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947965; c=relaxed/simple; bh=fQvZYnRp+sSZHSTlH4EKlMFqwqDwCAfksYi/u5y+mCc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TO2iQKcTnq5xZgMREkYXqwg5t2MEdUnZUYv4RV+S13ntsSMEDSUlIDC5JOsdF+hWhMrEbZNiryp40SbN4mwY7gRcaDign17acCTAxEUMR3Veyif/uNj2kwCjcdZIpk8+koOGc8SFjXve5FtBEPV/kKiKyTfVdoCUTv3yH4AeQV4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxn5h6lz4f3nZx; Wed, 29 May 2024 09:59:09 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 91C3A1A0185; Wed, 29 May 2024 09:59:20 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S11; Wed, 29 May 2024 09:59:20 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 7/8] xfs: reserve blocks for truncating realtime inode Date: Wed, 29 May 2024 17:52:05 +0800 Message-Id: <20240529095206.2568162-8-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S11 X-Coremail-Antispam: 1UD129KBjvJXoW7ArW8AFy5urWUZr45Aw1fZwb_yoW8Gr18pF n7C3Z8WFs3Ww1jkayxAF1Fyw1UAa4vqr4jkFW8Gwnavw4DGr1ftrnYv3y8K3WUGrW2qw4F gr13J395Zw15ZF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmv14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4U MIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJV WUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUv cSsGvfC2KfnxnUUI43ZEXa7sRilksDUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi On a realtime inode, __xfs_bunmapi() could convert the unaligned extra blocks to unwritten state, but it couldn't work as expected on truncate down since the reserved block is zero in xfs_setattr_size(), fix this by reserved XFS_IS_REALTIME_INODE blocks for realtime inode. Signed-off-by: Zhang Yi --- fs/xfs/xfs_iops.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ec7b7bdf8825..c53de5e6ef66 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -17,6 +17,8 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_trans.h" +#include "xfs_trans_space.h" +#include "xfs_bmap_btree.h" #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_symlink.h" @@ -811,6 +813,7 @@ xfs_setattr_size( struct xfs_trans *tp; int error; uint lock_flags = 0; + uint resblks; bool did_zeroing = false; bool write_back = false; @@ -932,7 +935,9 @@ xfs_setattr_size( } } - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); + resblks = XFS_IS_REALTIME_INODE(ip) ? XFS_DIOSTRAT_SPACE_RES(mp, 0) : 0; + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, + 0, 0, &tp); if (error) return error; From patchwork Wed May 29 09:52:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13677653 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D850415B133; Wed, 29 May 2024 01:59:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947966; cv=none; b=erMads7Ne+vnuT437dKvz5x6SCa3cWcP2KK6hcpklOv0CrELLCvyiElMVt7K9RitpJK4k29fjtLIimvrREXb9LsS4uYodxOlRt8CeuR893kWak2OjpRhKKmtl6rmk5yEfFE5KtqtoMinRdWUEyt8moUXo2LajgCSzB5V71xMZKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716947966; c=relaxed/simple; bh=VGPSucUzbA/c3S9u7oVAQpYgV06MZBi1HmmrATX/mE8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=n3MUwbIq0txjRfrOqisLSEW8/9vF3E+qrZ/RKrL6vm/YO0Dj4qq/77C3w/hic2gfGuVWqcx539wMM3ud7fVmLur6/3BBzOf7jhcCnMXPYrVkTR7ReJMWdHSFEcdLwA2sK1MoZ1M/hrrL04aC5vGJINiLY8HQoduesKye5ZF8xg4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vpsxv0V0Jz4f3m8h; Wed, 29 May 2024 09:59:15 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1EE1B1A01B9; Wed, 29 May 2024 09:59:21 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgAn9g7wi1Zmr3XbNw--.12147S12; Wed, 29 May 2024 09:59:20 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 8/8] xfs: improve truncate on a realtime inode with huge extsize Date: Wed, 29 May 2024 17:52:06 +0800 Message-Id: <20240529095206.2568162-9-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgAn9g7wi1Zmr3XbNw--.12147S12 X-Coremail-Antispam: 1UD129KBjvJXoWxAFy8AryDtw43JF47urW5trb_yoW5Zr4DpF Z7J3WrGr4kG342kayvvF4jqw1akas2kr4UCFWrXr17Zwn8Jr1ftrn7t34rWw4Utr40ga90 gFn8C3y7Z3W3AFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmv14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4U MIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJV WUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUv cSsGvfC2KfnxnUUI43ZEXa7sRilksDUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi If we truncate down a realtime inode which extsize is too large, zeroing out the entire aligned EOF extent could be very slow. Fortunately, __xfs_bunmapi() would align the unmapped range to rtextsize, split and convert the extra blocks to unwritten state. So, adjust the blocksize to the filesystem blocksize if the rtextsize is large enough, let __xfs_bunmapi() to convert the tail blocks to unwritten, this could improve the truncate performance significantly. # mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \ -d rtinherit=1 -r extsize=1G /dev/pmem2s # for i in {1..5}; \ do dd if=/dev/zero of=/mnt/scratch/$i bs=1M count=1024; done # sync # time for i in {1..5}; \ do xfs_io -c "truncate 4k" /mnt/scratch/$i; done Before: real 0m16.762s user 0m0.008s sys 0m16.750s After: real 0m0.076s user 0m0.010s sys 0m0.069s Signed-off-by: Zhang Yi --- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 12 ++++++++++++ fs/xfs/xfs_iops.c | 9 +++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index db35167acef6..c0c1ab310aae 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1513,7 +1513,7 @@ xfs_itruncate_extents_flags( * the page cache can't scale that far. */ first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); - if (xfs_inode_has_bigrtalloc(ip)) + if (xfs_inode_has_bigrtalloc(ip) && !xfs_inode_has_hugertalloc(ip)) first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block); if (!xfs_verify_fileoff(mp, first_unmap_block)) { WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF); diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 292b90b5f2ac..4eed5b0c57c0 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -320,6 +320,18 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip) return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1; } +/* + * Decide if this file is a realtime file whose data allocation unit is larger + * than default. + */ +static inline bool xfs_inode_has_hugertalloc(struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + + return XFS_IS_REALTIME_INODE(ip) && + mp->m_sb.sb_rextsize > XFS_B_TO_FSB(mp, XFS_DFL_RTEXTSIZE); +} + /* * Return the buftarg used for data allocations on a given inode. */ diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index c53de5e6ef66..d5fc84e5a37c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -870,6 +870,15 @@ xfs_setattr_size( if (newsize < oldsize) { unsigned int blocksize = xfs_inode_alloc_unitsize(ip); + /* + * If the extsize is too large on a realtime inode, zeroing + * out the entire aligned EOF extent could be slow, adjust the + * blocksize to the filesystem blocksize, let __xfs_bunmapi() + * to convert the tail blocks to unwritten. + */ + if (xfs_inode_has_hugertalloc(ip)) + blocksize = i_blocksize(inode); + /* * Zeroing out the partial EOF block and the rest of the extra * aligned blocks on a downward truncate.