[RFC,v4,3/8] iomap: pass blocksize to iomap_truncate_page()

Message ID	20240529095206.2568162-4-yi.zhang@huaweicloud.com (mailing list archive)
State	New
Headers	show Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8937F26AF3; Wed, 29 May 2024 01:59:21 +0000 (UTC) From: Zhang Yi <yi.zhang@huaweicloud.com> To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [RFC PATCH v4 3/8] iomap: pass blocksize to iomap_truncate_page() Date: Wed, 29 May 2024 17:52:01 +0800 Message-Id: <20240529095206.2568162-4-yi.zhang@huaweicloud.com> In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	iomap/xfs: fix stale data exposure when truncating realtime inodes \| expand [RFC,v4,0/8] iomap/xfs: fix stale data exposure when truncating realtime inodes [RFC,v4,1/8] iomap: zeroing needs to be pagecache aware [RFC,v4,2/8] math64: add rem_u64() to just return the remainder [RFC,v4,3/8] iomap: pass blocksize to iomap_truncate_page() [RFC,v4,4/8] fsdax: pass blocksize to dax_truncate_page() [RFC,v4,5/8] xfs: refactor the truncating order [RFC,v4,6/8] xfs: correct the truncate blocksize of realtime inode [RFC,v4,7/8] xfs: reserve blocks for truncating realtime inode [RFC,v4,8/8] xfs: improve truncate on a realtime inode with huge extsize

Message ID

20240529095206.2568162-4-yi.zhang@huaweicloud.com (mailing list archive)

State

New

Headers

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
	djwong@kernel.org,
	hch@infradead.org,
	brauner@kernel.org,
	david@fromorbit.com,
	chandanbabu@kernel.org,
	jack@suse.cz,
	willy@infradead.org,
	yi.zhang@huawei.com,
	yi.zhang@huaweicloud.com,
	chengzhihao1@huawei.com,
	yukuai3@huawei.com
Subject: [RFC PATCH v4 3/8] iomap: pass blocksize to iomap_truncate_page()
Date: Wed, 29 May 2024 17:52:01 +0800
Message-Id: <20240529095206.2568162-4-yi.zhang@huaweicloud.com>
In-Reply-To: <20240529095206.2568162-1-yi.zhang@huaweicloud.com>
References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

iomap/xfs: fix stale data exposure when truncating realtime inodes | expand

Commit Message

Zhang Yi May 29, 2024, 9:52 a.m. UTC

From: Zhang Yi <yi.zhang@huawei.com>

iomap_truncate_page() always assumes the block size of the truncating
inode is i_blocksize(), this is not always true for some filesystems,
e.g. XFS does extent size alignment for realtime inodes. Drop this
assumption and pass the block size for zeroing into
iomap_truncate_page(), allow filesystems to indicate the correct block
size.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 8 ++++----
 fs/xfs/xfs_iomap.c     | 3 ++-
 include/linux/iomap.h  | 4 ++--
 3 files changed, 8 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig May 31, 2024, 12:39 p.m. UTC | #1

> -		const struct iomap_ops *ops)
> +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize,
> +		bool *did_zero, const struct iomap_ops *ops)
>  {
> -	unsigned int blocksize = i_blocksize(inode);
> -	unsigned int off = pos & (blocksize - 1);
> +	unsigned int off = rem_u64(pos, blocksize);
>  
>  	/* Block boundary? Nothing to do */
>  	if (!off)

Instad of passing yet another argument here, can we just kill
iomap_truncate_page?

I.e. just open code the rem_u64 and 0 offset check in the only caller
and call iomap_zero_range.  Same for the DAX variant and it's two
callers.

Brian Foster June 2, 2024, 11:16 a.m. UTC | #2

On Fri, May 31, 2024 at 05:39:10AM -0700, Christoph Hellwig wrote:
> > -		const struct iomap_ops *ops)
> > +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize,
> > +		bool *did_zero, const struct iomap_ops *ops)
> >  {
> > -	unsigned int blocksize = i_blocksize(inode);
> > -	unsigned int off = pos & (blocksize - 1);
> > +	unsigned int off = rem_u64(pos, blocksize);
> >  
> >  	/* Block boundary? Nothing to do */
> >  	if (!off)
> 
> Instad of passing yet another argument here, can we just kill
> iomap_truncate_page?
> 
> I.e. just open code the rem_u64 and 0 offset check in the only caller
> and call iomap_zero_range.  Same for the DAX variant and it's two
> callers.
> 
> 

Hey Christoph,

I've wondered the same about killing off iomap_truncate_page(), but JFYI
one of the several prototypes I have around of other potential ways to
address this problem slightly splits off truncate page from being a
straight zero range wrapper. A quick diff [2] of that is inlined below
for reference (only lightly tested, may be busted, etc.).

The idea is that IIRC truncate_page() was really the only zero range
user that might actually encounter dirty cache over unwritten mappings,
so given that and the typically constrained range size of truncate page,
just let it be more aggressive about bypassing the unwritten mapping
optimization in iomap_zero_iter(). Just something else to consider, and
this is definitely not something you'd want to do for zero range proper.

Brian

P.S., I think the last patches I actually posted around this were here
[1], but I also have multiple versions of that selective flush approach
ported to iomap instead of being XFS specific as well.

[1] https://lore.kernel.org/linux-xfs/20221104182358.2007475-1-bfoster@redhat.com/
    https://lore.kernel.org/linux-xfs/20221128173945.3953659-1-bfoster@redhat.com/
[2] truncate page POC:

--- 8< ---

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c5802a459334..a261e732ea05 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1380,7 +1380,8 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
+			      bool dirty_cache)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
@@ -1388,7 +1389,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 	loff_t written = 0;
 
 	/* already zeroed?  we're done. */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+	if (srcmap->type == IOMAP_HOLE ||
+	    (!dirty_cache && srcmap->type == IOMAP_UNWRITTEN))
 		return length;
 
 	do {
@@ -1439,7 +1441,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	int ret;
 
 	while ((ret = iomap_iter(&iter, ops)) > 0)
-		iter.processed = iomap_zero_iter(&iter, did_zero);
+		iter.processed = iomap_zero_iter(&iter, did_zero, false);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
@@ -1450,11 +1452,29 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 {
 	unsigned int blocksize = i_blocksize(inode);
 	unsigned int off = pos & (blocksize - 1);
+	struct iomap_iter iter = {
+		.inode		= inode,
+		.pos		= pos,
+		.len		= blocksize - off,
+		.flags		= IOMAP_ZERO,
+	};
+	loff_t end;
+	int ret;
+	bool dirty_cache = false;
 
 	/* Block boundary? Nothing to do */
 	if (!off)
 		return 0;
-	return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops);
+
+	/* overwrite unwritten ranges if any part of the range is dirty for
+	 * truncate page */
+	end = iter.pos + iter.len - 1;
+	if (filemap_range_needs_writeback(inode->i_mapping, iter.pos, end))
+		dirty_cache = true;
+
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_zero_iter(&iter, did_zero, dirty_cache);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_truncate_page);

Zhang Yi June 3, 2024, 1:23 p.m. UTC | #3

On 2024/5/31 20:39, Christoph Hellwig wrote:
>> -		const struct iomap_ops *ops)
>> +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize,
>> +		bool *did_zero, const struct iomap_ops *ops)
>>  {
>> -	unsigned int blocksize = i_blocksize(inode);
>> -	unsigned int off = pos & (blocksize - 1);
>> +	unsigned int off = rem_u64(pos, blocksize);
>>  
>>  	/* Block boundary? Nothing to do */
>>  	if (!off)
> 
> Instad of passing yet another argument here, can we just kill
> iomap_truncate_page?
> 
> I.e. just open code the rem_u64 and 0 offset check in the only caller
> and call iomap_zero_range.  Same for the DAX variant and it's two
> callers.
> 

Yeah, we could drop iomap_truncate_page() and dax_truncate_page(), but
that means we have to open code the zeroing length calculation or add a
fs private helper to do that in every filesystems. Now we only have xfs
and ext2 two caller, so it looks fine, but if the caller becomes more in
the future, this could becomes repetitive, if we keep them, all
filesystems could don't pay attention to these details.

Thanks,
Yi.

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a3a5d4eb7289..30b49d6e9d48 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -17,6 +17,7 @@ 
 #include <linux/bio.h>
 #include <linux/sched/signal.h>
 #include <linux/migrate.h>
+#include <linux/math64.h>
 #include "trace.h"
 
 #include "../internal.h"
@@ -1483,11 +1484,10 @@  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 EXPORT_SYMBOL_GPL(iomap_zero_range);
 
 int
-iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops)
+iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize,
+		bool *did_zero, const struct iomap_ops *ops)
 {
-	unsigned int blocksize = i_blocksize(inode);
-	unsigned int off = pos & (blocksize - 1);
+	unsigned int off = rem_u64(pos, blocksize);
 
 	/* Block boundary? Nothing to do */
 	if (!off)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 378342673925..32306804b01b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1471,10 +1471,11 @@  xfs_truncate_page(
 	bool			*did_zero)
 {
 	struct inode		*inode = VFS_I(ip);
+	unsigned int		blocksize = i_blocksize(inode);
 
 	if (IS_DAX(inode))
 		return dax_truncate_page(inode, pos, did_zero,
 					&xfs_dax_write_iomap_ops);
-	return iomap_truncate_page(inode, pos, did_zero,
+	return iomap_truncate_page(inode, pos, blocksize, did_zero,
 				   &xfs_buffered_write_iomap_ops);
 }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d..d67bf86ec582 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -273,8 +273,8 @@  int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
-int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops);
+int iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize,
+		bool *did_zero, const struct iomap_ops *ops);
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
 			const struct iomap_ops *ops);
 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,

[RFC,v4,3/8] iomap: pass blocksize to iomap_truncate_page()

Commit Message

Comments

Patch