From patchwork Thu Jun 13 09:00:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696455 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B42BE84D39; Thu, 13 Jun 2024 09:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269272; cv=none; b=WuJgj+gbUpJIf0EYQfKwQLjbXx/GhMiZaX6qNgtIdz7jUcRfuPV6v33Dh1ttIYeH12PvowNdqKGJEqhoman6FE5OYbNIcGrKWGHvVX2UZ1rWdpqjD2c7MGPbJ+YHkJiUpJghO2p6kJM5UREKAv5IRqA51P4LDbZAu90yU+cLDUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269272; c=relaxed/simple; bh=jN3LOegYWmxr2RDFSuira4nM3GDTISmpuP/5mtjQPv8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hIScCql+CqHv4e8UIULAGj/3/9UTpcaw+XhiYnhDJCG6lVaxbZi/UXlMgMM7WRi5T8Lx9rw8rOkCnmWL1TJDTp9xJNOqrmUXXyzniw6bPgF2j/uJRFy06o+qejeatLJM6IVokP0WzvYFLqowZpqOEJbbm8IxW94sE9VBOzZljRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbc336nz4f3kkK; Thu, 13 Jun 2024 17:01:00 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 0A2801A0181; Thu, 13 Jun 2024 17:01:07 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S5; Thu, 13 Jun 2024 17:01:06 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 1/8] math64: add rem_u64() to just return the remainder Date: Thu, 13 Jun 2024 17:00:26 +0800 Message-Id: <20240613090033.2246907-2-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S5 X-Coremail-Antispam: 1UD129KBjvJXoWxJrWfWFy8CF4rGw1kAF1fZwb_yoW8AF47pF sxCF98GFW8KFy3Ja1IyF12yr1Yv3Z7Gr47XFWagrW8u343tw4F9r4fJF4ftF4UJws3Aw45 GFy7GrWrWryavF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUSYLPUUUUU = X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi Add a new helper rem_u64() to only get the remainder of unsigned 64bit divide with 32bit divisor. Signed-off-by: Zhang Yi Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/math64.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/linux/math64.h b/include/linux/math64.h index d34def7f9a8c..efbe58c157e3 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -3,6 +3,7 @@ #define _LINUX_MATH64_H #include +#include #include #include #include @@ -12,6 +13,20 @@ #define div64_long(x, y) div64_s64((x), (y)) #define div64_ul(x, y) div64_u64((x), (y)) +/** + * rem_u64 - remainder of unsigned 64bit divide with 32bit divisor + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * + * Return: dividend % divisor + */ +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return dividend % divisor; +} + /** * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder * @dividend: unsigned 64bit dividend @@ -86,6 +101,13 @@ static inline s64 div64_s64(s64 dividend, s64 divisor) #define div64_long(x, y) div_s64((x), (y)) #define div64_ul(x, y) div_u64((x), (y)) +static inline u32 rem_u64(u64 dividend, u32 divisor) +{ + if (is_power_of_2(divisor)) + return dividend & (divisor - 1); + return do_div(dividend, divisor); +} + #ifndef div_u64_rem static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) { From patchwork Thu Jun 13 09:00:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696454 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B590286136; Thu, 13 Jun 2024 09:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269272; cv=none; b=o6C/oRUQtpZWnkeuB1CZlYlM7yiZElFeoE2970GJYxJ69sQ+48JfZXP0KLK9K4ZcKiu4P8K7zmlldAj0yCGRp1jGWUCSyIjf61LlcQjaCWTfOELe6xfxqdqgMVcd0uFlyvMfjlXqI7n9HWtzzPy/4bY7xR41EfwUiYsbIiMSlIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269272; c=relaxed/simple; bh=K3Eh+FqW9LwDC45mM+t6rRMFCNUZs2uHIWdVLJqpYzA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aCu6V/IyYV+y8wf1nz1viEBP96fcXEgpbl91nEUuCMQYH7oFWcMqr1BPJ2abZlLc/7rjJtmgBuIHoTTdEB4SW8f8WM7JuRIcxL5+52rSUL7i8h1yjyqErChAMp+MA7sJn9RbhFsEHv5cjya1L3kNrw2p2N9sI/LWNHfkrbjkiE8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4W0GbY3Nkpz4f3jMh; Thu, 13 Jun 2024 17:00:57 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 8854D1A0568; Thu, 13 Jun 2024 17:01:07 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S6; Thu, 13 Jun 2024 17:01:07 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 2/8] iomap: pass blocksize to iomap_truncate_page() Date: Thu, 13 Jun 2024 17:00:27 +0800 Message-Id: <20240613090033.2246907-3-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S6 X-Coremail-Antispam: 1UD129KBjvJXoWxuFyrKF15GFykZr47Kr45Jrb_yoW5Aw4UpF 1qkF45Gws3Xryj9F1kuFyjvw15tF1DGr40kryfKrZxZrn2qr1xtFn2kF42yF1jqrs7uF4j qFZ8K3y8Wr15A3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUAGYLUUUUU = X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi iomap_truncate_page() always assumes the block size of the truncating inode is i_blocksize(), this is not always true for some filesystems, e.g. XFS does extent size alignment for realtime inodes. Drop this assumption and pass the block size for zeroing into iomap_truncate_page(), allow filesystems to indicate the correct block size. Suggested-by: Dave Chinner Signed-off-by: Zhang Yi --- fs/iomap/buffered-io.c | 8 ++++---- fs/xfs/xfs_iomap.c | 3 ++- include/linux/iomap.h | 4 ++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 9952cc3a239b..4a23c3950a47 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "trace.h" #include "../internal.h" @@ -1453,11 +1454,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_zero_range); int -iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize); /* Block boundary? Nothing to do */ if (!off) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 378342673925..32306804b01b 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1471,10 +1471,11 @@ xfs_truncate_page( bool *did_zero) { struct inode *inode = VFS_I(ip); + unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) return dax_truncate_page(inode, pos, did_zero, &xfs_dax_write_iomap_ops); - return iomap_truncate_page(inode, pos, did_zero, + return iomap_truncate_page(inode, pos, blocksize, did_zero, &xfs_buffered_write_iomap_ops); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 6fc1c858013d..d67bf86ec582 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -273,8 +273,8 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops); int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, From patchwork Thu Jun 13 09:00:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696456 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4480B13C904; Thu, 13 Jun 2024 09:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; cv=none; b=iOjIZ915KlSy7zgnJhE2SzxiqV/57alcXDCkoFXP1w2JSN+Gek2E+46DN4l/jmdR+o0Ghyh/9zpBYwcpN3kYOK6ndhj/rkjvxIpgAuJ7n8hae3evOg0nvrrIa2JBsBqRJuYnCO8b2ZRLz8fMDGuLIy7QZS1E9CXz5gkJ1PlyG4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; c=relaxed/simple; bh=afHJRbRc/fwFUeIfW0Vig/K+3WKRwJ2tZp8Ww3qk80o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qKeSI5lDLkQPHrZzpBuYmiyYJFtFiCmRdcJM5EFO0wntY/GP6Qrco7JP7RgQSQDAUJFGzcxxM3nOha505dy+0k/QzRhyTuZcPOF5PZPzOmDmLc4oDeRDi4FIcqshc64QVykEFeOA6J22vAj4aXylIRQwtTmvtrQ9kU54jbiriqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0GbX4D4Zz4f3nJh; Thu, 13 Jun 2024 17:00:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1413D1A0FB5; Thu, 13 Jun 2024 17:01:08 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S7; Thu, 13 Jun 2024 17:01:07 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 3/8] fsdax: pass blocksize to dax_truncate_page() Date: Thu, 13 Jun 2024 17:00:28 +0800 Message-Id: <20240613090033.2246907-4-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S7 X-Coremail-Antispam: 1UD129KBjvJXoWxuFyrKF15GF1kGr4kGw1rWFg_yoW5urWDpF 1DCa15G397Xryj9F1kWF12vw45t3WDCF409ryxArZ3Zr9Fqr1IyF1qkF1YkF4UKr48u3yj qF98Kr47Gr15AFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUFfHUDUUU U X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi dax_truncate_page() always assumes the block size of the truncating inode is i_blocksize(), this is not always true for some filesystems, e.g. XFS does extent size alignment for realtime inodes. Drop this assumption and pass the block size for zeroing into dax_truncate_page(), allow filesystems to indicate the correct block size. Suggested-by: Dave Chinner Signed-off-by: Zhang Yi --- fs/dax.c | 8 ++++---- fs/ext2/inode.c | 4 ++-- fs/xfs/xfs_iomap.c | 2 +- include/linux/dax.h | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index becb4a6920c6..4cbd94fd96ed 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -1403,11 +1404,10 @@ int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, } EXPORT_SYMBOL_GPL(dax_zero_range); -int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +int dax_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize); /* Block boundary? Nothing to do */ if (!off) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 0caa1650cee8..337349c94adf 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1276,8 +1276,8 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) inode_dio_wait(inode); if (IS_DAX(inode)) - error = dax_truncate_page(inode, newsize, NULL, - &ext2_iomap_ops); + error = dax_truncate_page(inode, newsize, i_blocksize(inode), + NULL, &ext2_iomap_ops); else error = block_truncate_page(inode->i_mapping, newsize, ext2_get_block); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 32306804b01b..8cdfcbb5baa7 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1474,7 +1474,7 @@ xfs_truncate_page( unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) - return dax_truncate_page(inode, pos, did_zero, + return dax_truncate_page(inode, pos, blocksize, did_zero, &xfs_dax_write_iomap_ops); return iomap_truncate_page(inode, pos, blocksize, did_zero, &xfs_buffered_write_iomap_ops); diff --git a/include/linux/dax.h b/include/linux/dax.h index 9d3e3327af4c..4aa8ef7c8fd4 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -210,8 +210,8 @@ int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int dax_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); #if IS_ENABLED(CONFIG_DAX) int dax_read_lock(void); From patchwork Thu Jun 13 09:00:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696457 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4700136E00; Thu, 13 Jun 2024 09:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; cv=none; b=QrzLNQGBPGdomyNIJyx+xNHmNo0y4lneyId8e9lgevPG1oLFIHA/PxEjB6SIfzCnOSJqaQAOYK2Zp1mIeOXqjYSfGIW7mL/kkZVPWaPoJrGh+qK3L6nq8I8avBc/2IgFh2eNnMDbZfx9KACxwfgIkyfKGSHIvRf7k0lneQmXuR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; c=relaxed/simple; bh=8Ad+4LzW9z/L+C6yLlD1Vrpq/J5LuMDvIrmvgOZz/XM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uFYGOzMc8w1e9Crve1+O3+s2HG5p3UUS+RlyIr5OSkXsiqlppunKBd1YVWYM+FQMUAGMEPhJifBlzvkkj3cOxT/Ifp5LtJRdcJA3Y+cEsjEN3Z4D3jsPOfDzcr2NXIH80nVwWMjxmEud0sQcnoMw6cAMsjySztQ9PMCItEGigFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbd6zPmz4f3kkF; Thu, 13 Jun 2024 17:01:01 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 90E021A0FB7; Thu, 13 Jun 2024 17:01:08 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S8; Thu, 13 Jun 2024 17:01:08 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 4/8] xfs: refactor the truncating order Date: Thu, 13 Jun 2024 17:00:29 +0800 Message-Id: <20240613090033.2246907-5-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S8 X-Coremail-Antispam: 1UD129KBjvJXoW3Ar4fZw1DGr1kAFyUCF48Zwb_yoWfCry5pF 9xGas8Gr4kGa4UZr1kZF10qw1Fk3Z5Jay0yry0gr97Za4DXryIkF97KFy0gFWUKrs3Jw4Y qF4DGayfW3s5ZaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUF18B UUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi When truncating down an inode, we call xfs_truncate_page() to zero out the tail partial block that beyond new EOF, which prevents exposing stale data. But xfs_truncate_page() always assumes the blocksize is i_blocksize(inode), it's not always true if we have a large allocation unit for a file and we should aligned to this unitsize, e.g. realtime inode should aligned to the rtextsize. Current xfs_setattr_size() can't support zeroing out a large alignment size on trucate down since the process order is wrong. We first do zero out through xfs_truncate_page(), and then update inode size through truncate_setsize() immediately. If the zeroed range is larger than a folio, the write back path would not write back zeroed pagecache beyond the EOF folio, so it doesn't write zeroes to the entire tail extent and could expose stale data after an appending write into the next aligned extent. We need to adjust the order to zero out tail aligned blocks, write back zeroed or cached data, update i_size and drop all the pagecache beyond the allocation unit containing EOF, preparing for the fix of realtime inode and supporting the upcoming forced alignment feature. Signed-off-by: Zhang Yi --- fs/xfs/xfs_iomap.c | 2 +- fs/xfs/xfs_iomap.h | 3 +- fs/xfs/xfs_iops.c | 162 +++++++++++++++++++++++++++++---------------- 3 files changed, 109 insertions(+), 58 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 8cdfcbb5baa7..0369b64cc3f4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1468,10 +1468,10 @@ int xfs_truncate_page( struct xfs_inode *ip, loff_t pos, + unsigned int blocksize, bool *did_zero) { struct inode *inode = VFS_I(ip); - unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) return dax_truncate_page(inode, pos, blocksize, did_zero, diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index 4da13440bae9..feb1610cb645 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -25,7 +25,8 @@ int xfs_bmbt_to_iomap(struct xfs_inode *ip, struct iomap *iomap, int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len, bool *did_zero); -int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, bool *did_zero); +int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, + unsigned int blocksize, bool *did_zero); static inline xfs_filblks_t xfs_aligned_fsb_count( diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ff222827e550..0919a42cceb6 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -792,6 +792,108 @@ xfs_setattr_nonsize( return error; } +/* + * Zero and flush data on truncate. + * + * Zero out any data beyond EOF on size changed truncate, write back + * all cached data if we need to extend ondisk EOF, and drop all the + * pagecache that beyond the new EOF block. + */ +STATIC int +xfs_setattr_truncate_data( + struct xfs_inode *ip, + xfs_off_t oldsize, + xfs_off_t newsize) +{ + struct inode *inode = VFS_I(ip); + bool did_zeroing = false; + bool extending_ondisk_eof; + unsigned int blocksize; + int error; + + extending_ondisk_eof = newsize > ip->i_disk_size && + oldsize != ip->i_disk_size; + + /* + * Start with zeroing any data beyond EOF that we may expose on file + * extension, or zeroing out the rest of the block on a downward + * truncate. + * + * We've already locked out new page faults, so now we can safely call + * truncate_setsize() or truncate_pagecache() to remove pages from the + * page cache knowing they won't get refaulted until we drop the + * XFS_MMAPLOCK_EXCL after the extent manipulations are complete. The + * truncate_setsize() call also cleans partial EOF page PTEs on + * extending truncates and hence ensures sub-page block size filesystems + * are correctly handled, too. + */ + if (newsize >= oldsize) { + /* File extentsion */ + if (newsize != oldsize) { + trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); + error = xfs_zero_range(ip, oldsize, newsize - oldsize, + &did_zeroing); + if (error) + return error; + } + + truncate_setsize(inode, newsize); + + /* + * We are going to log the inode size change in this transaction + * so any previous writes that are beyond the on disk EOF and + * the new EOF that have not been written out need to be written + * here. If we do not write the data out, we expose ourselves + * to the null files problem. Note that this includes any block + * zeroing we did above; otherwise those blocks may not be + * zeroed after a crash. + */ + if (did_zeroing || extending_ondisk_eof) { + error = filemap_write_and_wait_range(inode->i_mapping, + ip->i_disk_size, newsize - 1); + if (error) + return error; + } + return 0; + } + + /* Truncate down */ + blocksize = i_blocksize(inode); + + /* + * iomap won't detect a dirty page over an unwritten block (or a cow + * block over a hole) and subsequently skips zeroing the newly post-EOF + * portion of the page. Flush the new EOF to convert the block before + * the pagecache truncate. + */ + error = filemap_write_and_wait_range(inode->i_mapping, newsize, + roundup_64(newsize, blocksize) - 1); + if (error) + return error; + + error = xfs_truncate_page(ip, newsize, blocksize, &did_zeroing); + if (error) + return error; + + if (did_zeroing || extending_ondisk_eof) { + error = filemap_write_and_wait_range(inode->i_mapping, + min_t(loff_t, ip->i_disk_size, newsize), + roundup_64(newsize, blocksize) - 1); + if (error) + return error; + } + + /* + * Open code truncate_setsize(), update the incore i_size after flushing + * dirty tail pages to disk, don't zero out the partial EOF folio which + * may contains already zeroed tail blocks again and just drop all the + * pagecache beyond the allocation unit containing EOF. + */ + i_size_write(inode, newsize); + truncate_pagecache(inode, roundup_64(newsize, blocksize)); + return 0; +} + /* * Truncate file. Must have write permission and not be a directory. * @@ -811,7 +913,6 @@ xfs_setattr_size( struct xfs_trans *tp; int error; uint lock_flags = 0; - bool did_zeroing = false; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); ASSERT(S_ISREG(inode->i_mode)); @@ -853,40 +954,7 @@ xfs_setattr_size( * the transaction because the inode cannot be unlocked once it is a * part of the transaction. * - * Start with zeroing any data beyond EOF that we may expose on file - * extension, or zeroing out the rest of the block on a downward - * truncate. - */ - if (newsize > oldsize) { - trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); - error = xfs_zero_range(ip, oldsize, newsize - oldsize, - &did_zeroing); - } else { - /* - * iomap won't detect a dirty page over an unwritten block (or a - * cow block over a hole) and subsequently skips zeroing the - * newly post-EOF portion of the page. Flush the new EOF to - * convert the block before the pagecache truncate. - */ - error = filemap_write_and_wait_range(inode->i_mapping, newsize, - newsize); - if (error) - return error; - error = xfs_truncate_page(ip, newsize, &did_zeroing); - } - - if (error) - return error; - - /* - * We've already locked out new page faults, so now we can safely remove - * pages from the page cache knowing they won't get refaulted until we - * drop the XFS_MMAP_EXCL lock after the extent manipulations are - * complete. The truncate_setsize() call also cleans partial EOF page - * PTEs on extending truncates and hence ensures sub-page block size - * filesystems are correctly handled, too. - * - * We have to do all the page cache truncate work outside the + * We also have to do all the page cache truncate work outside the * transaction context as the "lock" order is page lock->log space * reservation as defined by extent allocation in the writeback path. * Hence a truncate can fail with ENOMEM from xfs_trans_alloc(), but @@ -894,28 +962,10 @@ xfs_setattr_size( * user visible changes). There's not much we can do about this, except * to hope that the caller sees ENOMEM and retries the truncate * operation. - * - * And we update in-core i_size and truncate page cache beyond newsize - * before writeback the [i_disk_size, newsize] range, so we're - * guaranteed not to write stale data past the new EOF on truncate down. */ - truncate_setsize(inode, newsize); - - /* - * We are going to log the inode size change in this transaction so - * any previous writes that are beyond the on disk EOF and the new - * EOF that have not been written out need to be written here. If we - * do not write the data out, we expose ourselves to the null files - * problem. Note that this includes any block zeroing we did above; - * otherwise those blocks may not be zeroed after a crash. - */ - if (did_zeroing || - (newsize > ip->i_disk_size && oldsize != ip->i_disk_size)) { - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ip->i_disk_size, newsize - 1); - if (error) - return error; - } + error = xfs_setattr_truncate_data(ip, oldsize, newsize); + if (error) + return error; error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) From patchwork Thu Jun 13 09:00:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696458 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB8FC13C9D5; Thu, 13 Jun 2024 09:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; cv=none; b=tMatzhqTB2udQBr47Tzw5eBvfAF48qCtKprNq1quB1Csj53usfXlCz2giOTBqU+IAUNu3NdeUmEtlWdxM5zIq0/FOOkJWPvbJZ+Uwr+fRixbjji7q4saPGNjh05smIKX1mTQH+eax8QmIz+lVJ132K4w2PfUeuAJsjDkOD0wv/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269273; c=relaxed/simple; bh=Dk1ccNyqE3+a7Sva3IDjFE76nOjLd/K1QdYciQywgH8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Vo9tQ7+QV/XO86UKH8/JLmoR+LHA0jP7lO9D1ckoq0aZ6qBY7uPW/ErESAtisOuKruYFT6VFPI1IxoLXmezzmx+82xB+W/t6lKA5ikXLmSYjqS1CmipQTCBz6+lZUeFuUD9kBNQk+RY673ZYWjOUSxMkmziVRoMHE02LKgahFuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbb08ldz4f3jMS; Thu, 13 Jun 2024 17:00:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1A3DE1A0185; Thu, 13 Jun 2024 17:01:09 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S9; Thu, 13 Jun 2024 17:01:08 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 5/8] xfs: correct the truncate blocksize of realtime inode Date: Thu, 13 Jun 2024 17:00:30 +0800 Message-Id: <20240613090033.2246907-6-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S9 X-Coremail-Antispam: 1UD129KBjvJXoW7uw4DZr18AryUuFy8WrW3Wrg_yoW8uFW8pF Z7Aa4UGrWkW340k3W0qFn0qw1jka4kAr47ArWrZrn7Xa4kJr13trn2yry0gw43trs7X3WY gF15C3y7Z3W5JaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUF18B UUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi When unaligned truncating down a realtime file which sb_rextsize is bigger than one block, xfs_truncate_page() only zeros out the tail EOF block, this could expose stale data since commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")'. If we truncate file that contains a large enough written extent: |< rxext >|< rtext >| ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW ^ (new EOF) ^ old EOF Since we only zeros out the tail of the EOF block, and xfs_itruncate_extents() unmap the whole ailgned extents, it becomes this state: |< rxext >| ...WWWzWWWWWWWWWWWWW ^ new EOF Then if we do an extending write like this, the blocks in the previous tail extent becomes stale: |< rxext >| ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW ^ old EOF ^ append start ^ new EOF Fix this by zeroing out the tail allocation uint and also make sure xfs_itruncate_extents() unmap allocation uint aligned extents. Signed-off-by: Zhang Yi --- fs/xfs/xfs_inode.c | 3 ++- fs/xfs/xfs_iops.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 58fb7a5062e1..92daa2279053 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1511,7 +1511,8 @@ xfs_itruncate_extents_flags( * We have to free all the blocks to the bmbt maximum offset, even if * the page cache can't scale that far. */ - first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); + first_unmap_block = XFS_B_TO_FSB(mp, + roundup_64(new_size, xfs_inode_alloc_unitsize(ip))); if (!xfs_verify_fileoff(mp, first_unmap_block)) { WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF); return 0; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 0919a42cceb6..8e7e6c435fb3 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -858,7 +858,7 @@ xfs_setattr_truncate_data( } /* Truncate down */ - blocksize = i_blocksize(inode); + blocksize = xfs_inode_alloc_unitsize(ip); /* * iomap won't detect a dirty page over an unwritten block (or a cow From patchwork Thu Jun 13 09:00:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696459 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A285113D88E; Thu, 13 Jun 2024 09:01:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269274; cv=none; b=TYMru1/xmT8aFBJK4ddh72rA1v42iS1uYQd5aVTusaGsnt/THjiYQGIKyTNYA/DfvXXuk5jPngvY0iBPIPt5p8NDO9glc1EUuby6yLpwzphclHv+5gwU4Sa82L+h6DA9L9WAA2UCZ7N4kObwsqs/xf4fGgw8sHRau7xe1o6yFa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269274; c=relaxed/simple; bh=VnFWnNXwhY2NmdloN46mKkABPKdG+djXIidYyS5lLK0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WLN1XqyIgFPQJ9eMdqphzNtLqN6rH27firMSrFBDnrZnTGfSmsfW8JGncZ5vVQ++x9E774q3RdyzSNu6ajwLDX5C1DUfwYyAFLEY19aOKEdxNseIzgDj+EW82j/kERgUBSdSGKJE3WkWtkX/luyXFlkaDRHtO32tJyp6iSIFw5c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbg0DxQz4f3kkK; Thu, 13 Jun 2024 17:01:03 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 97BF41A1364; Thu, 13 Jun 2024 17:01:09 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S10; Thu, 13 Jun 2024 17:01:09 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 6/8] xfs: reserve blocks for truncating large realtime inode Date: Thu, 13 Jun 2024 17:00:31 +0800 Message-Id: <20240613090033.2246907-7-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S10 X-Coremail-Antispam: 1UD129KBjvJXoW7CFyUtw4UGw4rXw45urWDCFg_yoW8AF18pF 97Ca98Gr4DW3WYkayfZF1rtw1jyaykKr4jkFW8Wrnav3yDtr1SyFn7t3yUKF1rGrZ8Xw4Y gr15C398u3W3ZF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUF18B UUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi For a large realtime inode, __xfs_bunmapi() could split the tail written extent and convert the later one that beyond EOF block to unwritten, but it couldn't work as expected on truncate down now since the reserved block is zero in xfs_setattr_size(), fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for large realtime inode. Signed-off-by: Zhang Yi Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_iops.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 8e7e6c435fb3..8af13fd37f1b 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -17,6 +17,8 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_trans.h" +#include "xfs_trans_space.h" +#include "xfs_bmap_btree.h" #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_symlink.h" @@ -913,6 +915,7 @@ xfs_setattr_size( struct xfs_trans *tp; int error; uint lock_flags = 0; + uint resblks = 0; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); ASSERT(S_ISREG(inode->i_mode)); @@ -967,7 +970,17 @@ xfs_setattr_size( if (error) return error; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); + /* + * For realtime inode with more than one block rtextsize, we need the + * block reservation for bmap btree block allocations/splits that can + * happen since it could split the tail written extent and convert the + * right beyond EOF one to unwritten. + */ + if (xfs_inode_has_bigrtalloc(ip)) + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0); + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, + 0, 0, &tp); if (error) return error; From patchwork Thu Jun 13 09:00:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696460 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B76413E897; Thu, 13 Jun 2024 09:01:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269275; cv=none; b=ep7t3ULouI4vX+WsAgtaqFOsw9/9FPQ+BUP29oGFallwkwMeZ0YvY+EtT6x/EVK9sr2qkMGtE8DrWdscz60VcvUF2FOJ+21Wh1PMwHe846FAzH71VYTIFBog0AHYZGL5PTfJo4k5LensfaXiaeft2QzJIRpFC5QYiBLTIsRrKTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269275; c=relaxed/simple; bh=qlgBcelue/cx58rj3054/jng7dtRnGsdILX5JaM24+w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BjAb9bhQmShdSLWivuzchhgWjpRycI8XGCa0JDR7+UN5GSgsBBugu1GVSRhUhMRqI1XpmAPdJsYxrNDTFx+wri6wVnfSd4lKRNpRXsEQwNAqcv/sNZSrYAkAnX3lUrS2LjSfRSMAV+v3y5AQByhOEt9l58vuSIxWEzSGsWY98rM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbg3kBjz4f3kk5; Thu, 13 Jun 2024 17:01:03 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 1FCE61A0568; Thu, 13 Jun 2024 17:01:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S11; Thu, 13 Jun 2024 17:01:09 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 7/8] xfs: speed up truncating down a big realtime inode Date: Thu, 13 Jun 2024 17:00:32 +0800 Message-Id: <20240613090033.2246907-8-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S11 X-Coremail-Antispam: 1UD129KBjvJXoWxAFy5Cry5ZFy8Zw4xAF4fXwb_yoW5Cw17pF Z7Ka45GrWkt3429aykZF4qqw1Y9as2ya1UCrW5XryxA3Z8Jw1Skrn3t34rJw4Utr4vqa4q qF1vk3y7Z3W3XFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUA rcfUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi If we truncate down a big realtime inode, zero out the entire aligned EOF extent could gets slow down as the rtextsize increases. Fortunately, __xfs_bunmapi() would align the unmapped range to rtextsize, split and convert the blocks beyond EOF to unwritten. So speed up this by adjusting the unitsize to the filesystem blocksize when truncating down a large realtime inode, let __xfs_bunmapi() convert the tail blocks to unwritten, this could improve the performance significantly. # mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \ -d rtinherit=1 -r extsize=$rtextsize /dev/pmem2s # mount -ortdev=/dev/pmem1s /dev/pmem2s /mnt/scratch # for i in {1..1000}; \ do dd if=/dev/zero of=/mnt/scratch/$i bs=$rtextsize count=1024; done # sync # time for i in {1..1000}; \ do xfs_io -c "truncate 4k" /mnt/scratch/$i; done rtextsize 8k 16k 32k 64k 256k 1024k before: 9.601s 10.229s 11.153s 12.086s 12.259s 20.141s after: 9.710s 9.642s 9.958s 9.441s 10.021s 10.526s Signed-off-by: Zhang Yi --- fs/xfs/xfs_inode.c | 10 ++++++++-- fs/xfs/xfs_iops.c | 9 +++++++++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 92daa2279053..5e837ed093b0 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1487,6 +1487,7 @@ xfs_itruncate_extents_flags( struct xfs_trans *tp = *tpp; xfs_fileoff_t first_unmap_block; int error = 0; + unsigned int unitsize = xfs_inode_alloc_unitsize(ip); xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); if (atomic_read(&VFS_I(ip)->i_count)) @@ -1510,9 +1511,14 @@ xfs_itruncate_extents_flags( * * We have to free all the blocks to the bmbt maximum offset, even if * the page cache can't scale that far. + * + * For big realtime inode, don't aligned to allocation unitsize, + * it'll split the extent and convert the tail blocks to unwritten. */ - first_unmap_block = XFS_B_TO_FSB(mp, - roundup_64(new_size, xfs_inode_alloc_unitsize(ip))); + if (xfs_inode_has_bigrtalloc(ip)) + unitsize = i_blocksize(VFS_I(ip)); + first_unmap_block = XFS_B_TO_FSB(mp, roundup_64(new_size, unitsize)); + if (!xfs_verify_fileoff(mp, first_unmap_block)) { WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF); return 0; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 8af13fd37f1b..1903c06d39bc 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -862,6 +862,15 @@ xfs_setattr_truncate_data( /* Truncate down */ blocksize = xfs_inode_alloc_unitsize(ip); + /* + * If it's a big realtime inode, zero out the entire EOF extent could + * get slow down as the rtextsize increases, speed it up by adjusting + * the blocksize to the filesystem blocksize, let __xfs_bunmapi() to + * split the extent and convert the tail blocks to unwritten. + */ + if (xfs_inode_has_bigrtalloc(ip)) + blocksize = i_blocksize(inode); + /* * iomap won't detect a dirty page over an unwritten block (or a cow * block over a hole) and subsequently skips zeroing the newly post-EOF From patchwork Thu Jun 13 09:00:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 13696461 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9186413F44A; Thu, 13 Jun 2024 09:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269275; cv=none; b=o3CO+1+ktR+s7EnNQI8crWZ/saJQB5J6UPV50jWtgD22sz2MEMZ5DzxN+OU+Iv0h4BI4PzhwHqVB2LmeArGK7B/uwb7fuJw6QeJknw3HnYSw8Y719WV1ZucMI9/ytcdA58RoPuygJVa73+sjSPNjkgfTli0Wae4MaiQAhpZaccE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718269275; c=relaxed/simple; bh=pBXZbUm/Jl5grb2RKQ/TbHzbCh4rKjcU503dKpopY5k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kyKczP33hp1+0c8KuLNEJT5PpaSjyDVl0PCMtoZG/NV+eBvPsWUVmq1R+QtbSMstJg9eaKDhxOVL96PrMmVwx1crnkdeOt95yGeHrWC2j0oaO10eN7yLSx2gbsW1ZOLJs0827sVw7edx3FUMKWnth1XhUlCThuqJ+Pi1nel1AIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4W0Gbh0CLKz4f3kkY; Thu, 13 Jun 2024 17:01:04 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 9CD191A0568; Thu, 13 Jun 2024 17:01:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgBXKBFOtWpmHK1uPQ--.16895S12; Thu, 13 Jun 2024 17:01:10 +0800 (CST) From: Zhang Yi To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH -next v5 8/8] iomap: don't increase i_size in iomap_write_end() Date: Thu, 13 Jun 2024 17:00:33 +0800 Message-Id: <20240613090033.2246907-9-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> References: <20240613090033.2246907-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: cCh0CgBXKBFOtWpmHK1uPQ--.16895S12 X-Coremail-Antispam: 1UD129KBjvJXoWxCrW3CFWxCFWUWr48Cw4DJwb_yoWrAF4kpr y293yrCan7tw17Wr1kAF98ZryYka4fKFW7CrW7GrWavFn0yr1xKF1rWayYyF95J3srCF4S qr4kA3yrWF1UAr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E 14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIx kGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAF wI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr 0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUA rcfUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi This reverts commit '0841ea4a3b41 ("iomap: keep on increasing i_size in iomap_write_end()")'. After xfs could zero out the tail blocks aligned to the allocation unitsize and convert the tail blocks to unwritten for realtime inode on truncate down, it couldn't expose any stale data when unaligned truncate down realtime inodes, so we could keep on stop increasing i_size for IOMAP_UNSHARE and IOMAP_ZERO in iomap_write_end(). Signed-off-by: Zhang Yi --- fs/iomap/buffered-io.c | 53 +++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 4a23c3950a47..75360128f1da 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -891,37 +891,22 @@ static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, size_t copied, struct folio *folio) { const struct iomap *srcmap = iomap_iter_srcmap(iter); - loff_t old_size = iter->inode->i_size; - size_t written; if (srcmap->type == IOMAP_INLINE) { iomap_write_end_inline(iter, folio, pos, copied); - written = copied; - } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) { - written = block_write_end(NULL, iter->inode->i_mapping, pos, - len, copied, &folio->page, NULL); - WARN_ON_ONCE(written != copied && written != 0); - } else { - written = __iomap_write_end(iter->inode, pos, len, copied, - folio) ? copied : 0; + return true; } - /* - * Update the in-memory inode size after copying the data into the page - * cache. It's up to the file system to write the updated size to disk, - * preferably after I/O completion so that no stale data is exposed. - * Only once that's done can we unlock and release the folio. - */ - if (pos + written > old_size) { - i_size_write(iter->inode, pos + written); - iter->iomap.flags |= IOMAP_F_SIZE_CHANGED; - } - __iomap_put_folio(iter, pos, written, folio); + if (srcmap->flags & IOMAP_F_BUFFER_HEAD) { + size_t bh_written; - if (old_size < pos) - pagecache_isize_extended(iter->inode, old_size, pos); + bh_written = block_write_end(NULL, iter->inode->i_mapping, pos, + len, copied, &folio->page, NULL); + WARN_ON_ONCE(bh_written != copied && bh_written != 0); + return bh_written == copied; + } - return written == copied; + return __iomap_write_end(iter->inode, pos, len, copied, folio); } static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) @@ -936,6 +921,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) do { struct folio *folio; + loff_t old_size; size_t offset; /* Offset into folio */ size_t bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ @@ -987,6 +973,23 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) written = iomap_write_end(iter, pos, bytes, copied, folio) ? copied : 0; + /* + * Update the in-memory inode size after copying the data into + * the page cache. It's up to the file system to write the + * updated size to disk, preferably after I/O completion so that + * no stale data is exposed. Only once that's done can we + * unlock and release the folio. + */ + old_size = iter->inode->i_size; + if (pos + written > old_size) { + i_size_write(iter->inode, pos + written); + iter->iomap.flags |= IOMAP_F_SIZE_CHANGED; + } + __iomap_put_folio(iter, pos, written, folio); + + if (old_size < pos) + pagecache_isize_extended(iter->inode, old_size, pos); + cond_resched(); if (unlikely(written == 0)) { /* @@ -1357,6 +1360,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) bytes = folio_size(folio) - offset; ret = iomap_write_end(iter, pos, bytes, bytes, folio); + __iomap_put_folio(iter, pos, bytes, folio); if (WARN_ON_ONCE(!ret)) return -EIO; @@ -1422,6 +1426,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) folio_mark_accessed(folio); ret = iomap_write_end(iter, pos, bytes, bytes, folio); + __iomap_put_folio(iter, pos, bytes, folio); if (WARN_ON_ONCE(!ret)) return -EIO;