diff mbox series

[08/43] xfs: report the correct dio alignment for COW inodes

Message ID 20241211085636.1380516-9-hch@lst.de (mailing list archive)
State New
Headers show
Series [01/43] xfs: constify feature checks | expand

Commit Message

Christoph Hellwig Dec. 11, 2024, 8:54 a.m. UTC
For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.

Unfortunately the reported dio alignment can only report a single value
for reads and writes, so unless we want to trigger these read-modify
write cycles all the time, we need to increase both limits.

Without this zoned xfs triggers the warnings about failed page cache
invalidation in kiocb_invalidate_post_direct_write all the time when
running generic/551 when running on a 512 byte sector device, and
eventually fails the test due to miscompares.

Hopefully we can add a separate read alignment to statx eventually.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_ioctl.c |  6 +++++-
 fs/xfs/xfs_iops.c  | 15 ++++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

Comments

Darrick J. Wong Dec. 12, 2024, 9:29 p.m. UTC | #1
On Wed, Dec 11, 2024 at 09:54:33AM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Unfortunately the reported dio alignment can only report a single value
> for reads and writes, so unless we want to trigger these read-modify
> write cycles all the time, we need to increase both limits.
> 
> Without this zoned xfs triggers the warnings about failed page cache
> invalidation in kiocb_invalidate_post_direct_write all the time when
> running generic/551 when running on a 512 byte sector device, and
> eventually fails the test due to miscompares.
> 
> Hopefully we can add a separate read alignment to statx eventually.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_ioctl.c |  6 +++++-
>  fs/xfs/xfs_iops.c  | 15 ++++++++++++++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 726282e74d54..de8ba5345e17 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1213,7 +1213,11 @@ xfs_file_ioctl(
>  		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>  		struct dioattr		da;
>  
> -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
> +		da.d_mem = target->bt_logical_sectorsize;
> +		if (xfs_is_cow_inode(ip))
> +			da.d_miniosz = mp->m_sb.sb_blocksize;
> +		else
> +			da.d_miniosz = target->bt_logical_sectorsize;
>  		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>  
>  		if (copy_to_user(arg, &da, sizeof(da)))
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..990df072ba35 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -582,7 +582,20 @@ xfs_report_dioalign(
>  
>  	stat->result_mask |= STATX_DIOALIGN;
>  	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block.

That's not quite accurate -- we're always forced to write an entire file
allocation unit so that the rest of the bmap code doesn't have to deal
with a file range that's mapped to multiple different space extents.
For all the existing reflink scenarios the allocation unit is always an
fsblock so this is a trifling difference.

However, once we start adding reflink to the rt device then there comes
the question of needing to handle allocation unit > fsblock, and all
these bits would have to change.

IOWs, I'm saying that this should be:

	if (xfs_is_cow_inode(ip))
		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
	else
		...

Though ATM this is a distinction that doesn't make a difference.

--D

> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we provide an emulation by doing a read-modify-write cycle
> +	 * through the cache, but that is highly inefficient.  Thus report the
> +	 * natively supported size here.
> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
> +	else
> +		stat->dio_offset_align = bdev_logical_block_size(bdev);
>  }
>  
>  static void
> -- 
> 2.45.2
> 
>
Christoph Hellwig Dec. 13, 2024, 5:09 a.m. UTC | #2
On Thu, Dec 12, 2024 at 01:29:53PM -0800, Darrick J. Wong wrote:
> > +	/*
> > +	 * On COW inodes we are forced to always rewrite an entire file system
> > +	 * block.
> 
> That's not quite accurate -- we're always forced to write an entire file
> allocation unit so that the rest of the bmap code doesn't have to deal
> with a file range that's mapped to multiple different space extents.
> For all the existing reflink scenarios the allocation unit is always an
> fsblock so this is a trifling difference.

Which right now is a block until we support horrors such as reflink
on larget rtextsize inodes or the forcealign stuff.  But yes, it
could use the helper.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 726282e74d54..de8ba5345e17 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1213,7 +1213,11 @@  xfs_file_ioctl(
 		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
 		struct dioattr		da;
 
-		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
+		da.d_mem = target->bt_logical_sectorsize;
+		if (xfs_is_cow_inode(ip))
+			da.d_miniosz = mp->m_sb.sb_blocksize;
+		else
+			da.d_miniosz = target->bt_logical_sectorsize;
 		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
 
 		if (copy_to_user(arg, &da, sizeof(da)))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6b0228a21617..990df072ba35 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -582,7 +582,20 @@  xfs_report_dioalign(
 
 	stat->result_mask |= STATX_DIOALIGN;
 	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
-	stat->dio_offset_align = bdev_logical_block_size(bdev);
+
+	/*
+	 * On COW inodes we are forced to always rewrite an entire file system
+	 * block.
+	 *
+	 * Because applications assume they can do sector sized direct writes
+	 * on XFS we provide an emulation by doing a read-modify-write cycle
+	 * through the cache, but that is highly inefficient.  Thus report the
+	 * natively supported size here.
+	 */
+	if (xfs_is_cow_inode(ip))
+		stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
+	else
+		stat->dio_offset_align = bdev_logical_block_size(bdev);
 }
 
 static void