diff mbox series

[4/4] xfs: report the correct read/write dio alignment for reflinked inodes

Message ID 20250106151607.954940-5-hch@lst.de (mailing list archive)
State New
Headers show
Series [1/4] fs: reformat the statx definition | expand

Commit Message

Christoph Hellwig Jan. 6, 2025, 3:15 p.m. UTC
For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.

Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
vs write alignments for reflinked files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

Comments

Darrick J. Wong Jan. 6, 2025, 5:33 p.m. UTC | #1
On Mon, Jan 06, 2025 at 04:15:57PM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..053d05f5567d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,27 @@ xfs_report_dioalign(
>  	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>  	struct block_device	*bdev = target->bt_bdev;
>  
> -	stat->result_mask |= STATX_DIOALIGN;
> +	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>  	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block or RT extent.
> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> +	 * case.  Because that needs to copy the entire block into the buffer
> +	 * cache it is highly inefficient and can easily lead to page cache
> +	 * invalidation races.
> +	 *
> +	 * Tell applications to avoid this case by reporting the natively
> +	 * supported direct I/O read alignment.
> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> +	else
> +		stat->dio_offset_align = stat->dio_read_offset_align;

This is a good improvement!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

>  }
>  
>  static void
> @@ -658,7 +676,7 @@ xfs_vn_getattr(
>  		stat->rdev = inode->i_rdev;
>  		break;
>  	case S_IFREG:
> -		if (request_mask & STATX_DIOALIGN)
> +		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
>  			xfs_report_dioalign(ip, stat);
>  		if (request_mask & STATX_WRITE_ATOMIC)
>  			xfs_report_atomic_write(ip, stat);
> -- 
> 2.45.2
> 
>
John Garry Jan. 6, 2025, 6:37 p.m. UTC | #2
On 06/01/2025 15:15, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
>   1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..053d05f5567d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,27 @@ xfs_report_dioalign(
>   	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>   	struct block_device	*bdev = target->bt_bdev;
>   
> -	stat->result_mask |= STATX_DIOALIGN;
> +	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>   	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block or RT extent.
> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> +	 * case.  Because that needs to copy the entire block into the buffer
> +	 * cache it is highly inefficient and can easily lead to page cache
> +	 * invalidation races.
> +	 *
> +	 * Tell applications to avoid this case by reporting the natively
> +	 * supported direct I/O read alignment.

Maybe I mis-read the complete comment, but did you really mean "natively 
supported direct I/O write alignment"? You have been talking about 
writes only, but then finally mention read alignment.

> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> +	else
> +		stat->dio_offset_align = stat->dio_read_offset_align;
>   }
>   
>   static void
> @@ -658,7 +676,7 @@ xfs_vn_getattr(
>   		stat->rdev = inode->i_rdev;
>   		break;
>   	case S_IFREG:
> -		if (request_mask & STATX_DIOALIGN)
> +		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
>   			xfs_report_dioalign(ip, stat);
>   		if (request_mask & STATX_WRITE_ATOMIC)
>   			xfs_report_atomic_write(ip, stat);
Christoph Hellwig Jan. 7, 2025, 6:10 a.m. UTC | #3
On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
>> +	/*
>> +	 * On COW inodes we are forced to always rewrite an entire file system
>> +	 * block or RT extent.
>> +	 *
>> +	 * Because applications assume they can do sector sized direct writes
>> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
>> +	 * case.  Because that needs to copy the entire block into the buffer
>> +	 * cache it is highly inefficient and can easily lead to page cache
>> +	 * invalidation races.
>> +	 *
>> +	 * Tell applications to avoid this case by reporting the natively
>> +	 * supported direct I/O read alignment.
>
> Maybe I mis-read the complete comment, but did you really mean "natively 
> supported direct I/O write alignment"? You have been talking about writes 
> only, but then finally mention read alignment.

No, this is indeed intended to talk about the different (smaller) read
alignment we are now reporting.  But I guess the wording is confusing
enough that I should improve it?
Darrick J. Wong Jan. 7, 2025, 7:03 a.m. UTC | #4
On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
> >> +	/*
> >> +	 * On COW inodes we are forced to always rewrite an entire file system
> >> +	 * block or RT extent.
> >> +	 *
> >> +	 * Because applications assume they can do sector sized direct writes
> >> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> >> +	 * case.  Because that needs to copy the entire block into the buffer
> >> +	 * cache it is highly inefficient and can easily lead to page cache
> >> +	 * invalidation races.
> >> +	 *
> >> +	 * Tell applications to avoid this case by reporting the natively
> >> +	 * supported direct I/O read alignment.
> >
> > Maybe I mis-read the complete comment, but did you really mean "natively 
> > supported direct I/O write alignment"? You have been talking about writes 
> > only, but then finally mention read alignment.
> 
> No, this is indeed intended to talk about the different (smaller) read
> alignment we are now reporting.  But I guess the wording is confusing
> enough that I should improve it?

How about:

/*
 * For COW inodes, we can only perform out of place writes of entire
 * file allocation units (clusters).  For a sub-cluster directio write,
 * we must fall back to buffered I/O to perform the RMW.  At best this
 * is highly inefficient; at worst it leads to page cache invalidation
 * races.  Tell applications to avoid this by reporting separately the
 * read and (larger) write alignments.
 */

--D
John Garry Jan. 7, 2025, 9 a.m. UTC | #5
On 07/01/2025 07:03, Darrick J. Wong wrote:
> On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote:
>> On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
>>>> +	/*
>>>> +	 * On COW inodes we are forced to always rewrite an entire file system
>>>> +	 * block or RT extent.
>>>> +	 *
>>>> +	 * Because applications assume they can do sector sized direct writes
>>>> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
>>>> +	 * case.  Because that needs to copy the entire block into the buffer
>>>> +	 * cache it is highly inefficient and can easily lead to page cache
>>>> +	 * invalidation races.
>>>> +	 *
>>>> +	 * Tell applications to avoid this case by reporting the natively
>>>> +	 * supported direct I/O read alignment.
>>>
>>> Maybe I mis-read the complete comment, but did you really mean "natively
>>> supported direct I/O write alignment"? You have been talking about writes
>>> only, but then finally mention read alignment.
>>
>> No, this is indeed intended to talk about the different (smaller) read
>> alignment we are now reporting.  But I guess the wording is confusing
>> enough that I should improve it?

I know what you are saying, but I found the last paragraph odd. All 
along we talk about writes, but then conclude by mentioning reads (and 
not writes at all).

> 
> How about:
> 
> /*
>   * For COW inodes, we can only perform out of place writes of entire
>   * file allocation units (clusters).  For a sub-cluster directio write,
>   * we must fall back to buffered I/O to perform the RMW.  At best this
>   * is highly inefficient; at worst it leads to page cache invalidation
>   * races.  Tell applications to avoid this by reporting separately the
>   * read and (larger) write alignments.

yeah, that ending is better, but maybe Christoph wants to keep the more 
wordy original previous paragraphs.

>   */
> 
> --D
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6b0228a21617..053d05f5567d 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -580,9 +580,27 @@  xfs_report_dioalign(
 	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
 	struct block_device	*bdev = target->bt_bdev;
 
-	stat->result_mask |= STATX_DIOALIGN;
+	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
 	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
-	stat->dio_offset_align = bdev_logical_block_size(bdev);
+	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
+
+	/*
+	 * On COW inodes we are forced to always rewrite an entire file system
+	 * block or RT extent.
+	 *
+	 * Because applications assume they can do sector sized direct writes
+	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
+	 * case.  Because that needs to copy the entire block into the buffer
+	 * cache it is highly inefficient and can easily lead to page cache
+	 * invalidation races.
+	 *
+	 * Tell applications to avoid this case by reporting the natively
+	 * supported direct I/O read alignment.
+	 */
+	if (xfs_is_cow_inode(ip))
+		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
+	else
+		stat->dio_offset_align = stat->dio_read_offset_align;
 }
 
 static void
@@ -658,7 +676,7 @@  xfs_vn_getattr(
 		stat->rdev = inode->i_rdev;
 		break;
 	case S_IFREG:
-		if (request_mask & STATX_DIOALIGN)
+		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
 			xfs_report_dioalign(ip, stat);
 		if (request_mask & STATX_WRITE_ATOMIC)
 			xfs_report_atomic_write(ip, stat);