diff mbox series

[v2,1/2] iomap: fix handling of dirty folios over unwritten extents

Message ID 20240828181912.41517-2-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series iomap: flush dirty cache over unwritten mappings on zero range | expand

Commit Message

Brian Foster Aug. 28, 2024, 6:19 p.m. UTC
The iomap zero range implementation doesn't properly handle dirty
pagecache over unwritten mappings. It skips such mappings as if they
were pre-zeroed. If some part of an unwritten mapping is dirty in
pagecache from a previous write, the data in cache should be zeroed
as well. Instead, the data is left in cache and creates a stale data
exposure problem if writeback occurs sometime after the zero range.

Most callers are unaffected by this because the higher level
filesystem contexts that call zero range typically perform a filemap
flush of the target range for other reasons. A couple contexts that
don't otherwise need to flush are write file size extension and
truncate in XFS. The former path is currently susceptible to the
stale data exposure problem and the latter performs a flush
specifically to work around it.

This is clearly inconsistent and incomplete. As a first step toward
correcting behavior, lift the XFS workaround to iomap_zero_range()
and unconditionally flush the range before the zero range operation
proceeds. While this appears to be a bit of a big hammer, most all
users already do this from calling context save for the couple of
exceptions noted above. Future patches will optimize or elide this
flush while maintaining functional correctness.

Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 10 ++++++++++
 fs/xfs/xfs_iops.c      | 10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

Comments

Darrick J. Wong Aug. 28, 2024, 10:22 p.m. UTC | #1
On Wed, Aug 28, 2024 at 02:19:10PM -0400, Brian Foster wrote:
> The iomap zero range implementation doesn't properly handle dirty
> pagecache over unwritten mappings. It skips such mappings as if they
> were pre-zeroed. If some part of an unwritten mapping is dirty in
> pagecache from a previous write, the data in cache should be zeroed
> as well. Instead, the data is left in cache and creates a stale data
> exposure problem if writeback occurs sometime after the zero range.
> 
> Most callers are unaffected by this because the higher level
> filesystem contexts that call zero range typically perform a filemap
> flush of the target range for other reasons. A couple contexts that
> don't otherwise need to flush are write file size extension and
> truncate in XFS. The former path is currently susceptible to the
> stale data exposure problem and the latter performs a flush
> specifically to work around it.
> 
> This is clearly inconsistent and incomplete. As a first step toward
> correcting behavior, lift the XFS workaround to iomap_zero_range()
> and unconditionally flush the range before the zero range operation
> proceeds. While this appears to be a bit of a big hammer, most all
> users already do this from calling context save for the couple of
> exceptions noted above. Future patches will optimize or elide this
> flush while maintaining functional correctness.
> 
> Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
> Signed-off-by: Brian Foster <bfoster@redhat.com>

I wonder why gfs2 (aka the other iomap_zero_range user) doesn't have a
truncate-down flush hammer, but maybe it doesn't support unwritten
extents?  I didn't find anything obvious when I looked, so

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

(but you might want to see if Andreas has any loud objections to this)

--D

> ---
>  fs/iomap/buffered-io.c | 10 ++++++++++
>  fs/xfs/xfs_iops.c      | 10 ----------
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f420c53d86ac..3e846f43ff48 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1451,6 +1451,16 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	};
>  	int ret;
>  
> +	/*
> +	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
> +	 * pagecache must be flushed to ensure stale data from previous
> +	 * buffered writes is not exposed.
> +	 */
> +	ret = filemap_write_and_wait_range(inode->i_mapping,
> +			pos, pos + len - 1);
> +	if (ret)
> +		return ret;
> +
>  	while ((ret = iomap_iter(&iter, ops)) > 0)
>  		iter.processed = iomap_zero_iter(&iter, did_zero);
>  	return ret;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 1cdc8034f54d..ddd3697e6ecd 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -870,16 +870,6 @@ xfs_setattr_size(
>  		error = xfs_zero_range(ip, oldsize, newsize - oldsize,
>  				&did_zeroing);
>  	} else {
> -		/*
> -		 * iomap won't detect a dirty page over an unwritten block (or a
> -		 * cow block over a hole) and subsequently skips zeroing the
> -		 * newly post-EOF portion of the page. Flush the new EOF to
> -		 * convert the block before the pagecache truncate.
> -		 */
> -		error = filemap_write_and_wait_range(inode->i_mapping, newsize,
> -						     newsize);
> -		if (error)
> -			return error;
>  		error = xfs_truncate_page(ip, newsize, &did_zeroing);
>  	}
>  
> -- 
> 2.45.0
> 
>
Christoph Hellwig Aug. 29, 2024, 5:43 a.m. UTC | #2
On Wed, Aug 28, 2024 at 03:22:22PM -0700, Darrick J. Wong wrote:
> I wonder why gfs2 (aka the other iomap_zero_range user) doesn't have a
> truncate-down flush hammer, but maybe it doesn't support unwritten
> extents?  I didn't find anything obvious when I looked, so

gfs2 does not support unwritten extents.
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f420c53d86ac..3e846f43ff48 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1451,6 +1451,16 @@  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	};
 	int ret;
 
+	/*
+	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
+	 * pagecache must be flushed to ensure stale data from previous
+	 * buffered writes is not exposed.
+	 */
+	ret = filemap_write_and_wait_range(inode->i_mapping,
+			pos, pos + len - 1);
+	if (ret)
+		return ret;
+
 	while ((ret = iomap_iter(&iter, ops)) > 0)
 		iter.processed = iomap_zero_iter(&iter, did_zero);
 	return ret;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1cdc8034f54d..ddd3697e6ecd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -870,16 +870,6 @@  xfs_setattr_size(
 		error = xfs_zero_range(ip, oldsize, newsize - oldsize,
 				&did_zeroing);
 	} else {
-		/*
-		 * iomap won't detect a dirty page over an unwritten block (or a
-		 * cow block over a hole) and subsequently skips zeroing the
-		 * newly post-EOF portion of the page. Flush the new EOF to
-		 * convert the block before the pagecache truncate.
-		 */
-		error = filemap_write_and_wait_range(inode->i_mapping, newsize,
-						     newsize);
-		if (error)
-			return error;
 		error = xfs_truncate_page(ip, newsize, &did_zeroing);
 	}