Message ID | 20250106151607.954940-5-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/4] fs: reformat the statx definition | expand |
On Mon, Jan 06, 2025 at 04:15:57PM +0100, Christoph Hellwig wrote: > For I/O to reflinked blocks we always need to write an entire new > file system block, and the code enforces the file system block alignment > for the entire file if it has any reflinked blocks. > > Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read > vs write alignments for reflinked files. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 6b0228a21617..053d05f5567d 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -580,9 +580,27 @@ xfs_report_dioalign( > struct xfs_buftarg *target = xfs_inode_buftarg(ip); > struct block_device *bdev = target->bt_bdev; > > - stat->result_mask |= STATX_DIOALIGN; > + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN; > stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; > - stat->dio_offset_align = bdev_logical_block_size(bdev); > + stat->dio_read_offset_align = bdev_logical_block_size(bdev); > + > + /* > + * On COW inodes we are forced to always rewrite an entire file system > + * block or RT extent. > + * > + * Because applications assume they can do sector sized direct writes > + * on XFS we fall back to buffered I/O for sub-block direct I/O in that > + * case. Because that needs to copy the entire block into the buffer > + * cache it is highly inefficient and can easily lead to page cache > + * invalidation races. > + * > + * Tell applications to avoid this case by reporting the natively > + * supported direct I/O read alignment. > + */ > + if (xfs_is_cow_inode(ip)) > + stat->dio_offset_align = xfs_inode_alloc_unitsize(ip); > + else > + stat->dio_offset_align = stat->dio_read_offset_align; This is a good improvement! Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > } > > static void > @@ -658,7 +676,7 @@ xfs_vn_getattr( > stat->rdev = inode->i_rdev; > break; > case S_IFREG: > - if (request_mask & STATX_DIOALIGN) > + if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) > xfs_report_dioalign(ip, stat); > if (request_mask & STATX_WRITE_ATOMIC) > xfs_report_atomic_write(ip, stat); > -- > 2.45.2 > >
On 06/01/2025 15:15, Christoph Hellwig wrote: > For I/O to reflinked blocks we always need to write an entire new > file system block, and the code enforces the file system block alignment > for the entire file if it has any reflinked blocks. > > Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read > vs write alignments for reflinked files. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 6b0228a21617..053d05f5567d 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -580,9 +580,27 @@ xfs_report_dioalign( > struct xfs_buftarg *target = xfs_inode_buftarg(ip); > struct block_device *bdev = target->bt_bdev; > > - stat->result_mask |= STATX_DIOALIGN; > + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN; > stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; > - stat->dio_offset_align = bdev_logical_block_size(bdev); > + stat->dio_read_offset_align = bdev_logical_block_size(bdev); > + > + /* > + * On COW inodes we are forced to always rewrite an entire file system > + * block or RT extent. > + * > + * Because applications assume they can do sector sized direct writes > + * on XFS we fall back to buffered I/O for sub-block direct I/O in that > + * case. Because that needs to copy the entire block into the buffer > + * cache it is highly inefficient and can easily lead to page cache > + * invalidation races. > + * > + * Tell applications to avoid this case by reporting the natively > + * supported direct I/O read alignment. Maybe I mis-read the complete comment, but did you really mean "natively supported direct I/O write alignment"? You have been talking about writes only, but then finally mention read alignment. > + */ > + if (xfs_is_cow_inode(ip)) > + stat->dio_offset_align = xfs_inode_alloc_unitsize(ip); > + else > + stat->dio_offset_align = stat->dio_read_offset_align; > } > > static void > @@ -658,7 +676,7 @@ xfs_vn_getattr( > stat->rdev = inode->i_rdev; > break; > case S_IFREG: > - if (request_mask & STATX_DIOALIGN) > + if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) > xfs_report_dioalign(ip, stat); > if (request_mask & STATX_WRITE_ATOMIC) > xfs_report_atomic_write(ip, stat);
On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote: >> + /* >> + * On COW inodes we are forced to always rewrite an entire file system >> + * block or RT extent. >> + * >> + * Because applications assume they can do sector sized direct writes >> + * on XFS we fall back to buffered I/O for sub-block direct I/O in that >> + * case. Because that needs to copy the entire block into the buffer >> + * cache it is highly inefficient and can easily lead to page cache >> + * invalidation races. >> + * >> + * Tell applications to avoid this case by reporting the natively >> + * supported direct I/O read alignment. > > Maybe I mis-read the complete comment, but did you really mean "natively > supported direct I/O write alignment"? You have been talking about writes > only, but then finally mention read alignment. No, this is indeed intended to talk about the different (smaller) read alignment we are now reporting. But I guess the wording is confusing enough that I should improve it?
On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote: > >> + /* > >> + * On COW inodes we are forced to always rewrite an entire file system > >> + * block or RT extent. > >> + * > >> + * Because applications assume they can do sector sized direct writes > >> + * on XFS we fall back to buffered I/O for sub-block direct I/O in that > >> + * case. Because that needs to copy the entire block into the buffer > >> + * cache it is highly inefficient and can easily lead to page cache > >> + * invalidation races. > >> + * > >> + * Tell applications to avoid this case by reporting the natively > >> + * supported direct I/O read alignment. > > > > Maybe I mis-read the complete comment, but did you really mean "natively > > supported direct I/O write alignment"? You have been talking about writes > > only, but then finally mention read alignment. > > No, this is indeed intended to talk about the different (smaller) read > alignment we are now reporting. But I guess the wording is confusing > enough that I should improve it? How about: /* * For COW inodes, we can only perform out of place writes of entire * file allocation units (clusters). For a sub-cluster directio write, * we must fall back to buffered I/O to perform the RMW. At best this * is highly inefficient; at worst it leads to page cache invalidation * races. Tell applications to avoid this by reporting separately the * read and (larger) write alignments. */ --D
On 07/01/2025 07:03, Darrick J. Wong wrote: > On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote: >> On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote: >>>> + /* >>>> + * On COW inodes we are forced to always rewrite an entire file system >>>> + * block or RT extent. >>>> + * >>>> + * Because applications assume they can do sector sized direct writes >>>> + * on XFS we fall back to buffered I/O for sub-block direct I/O in that >>>> + * case. Because that needs to copy the entire block into the buffer >>>> + * cache it is highly inefficient and can easily lead to page cache >>>> + * invalidation races. >>>> + * >>>> + * Tell applications to avoid this case by reporting the natively >>>> + * supported direct I/O read alignment. >>> >>> Maybe I mis-read the complete comment, but did you really mean "natively >>> supported direct I/O write alignment"? You have been talking about writes >>> only, but then finally mention read alignment. >> >> No, this is indeed intended to talk about the different (smaller) read >> alignment we are now reporting. But I guess the wording is confusing >> enough that I should improve it? I know what you are saying, but I found the last paragraph odd. All along we talk about writes, but then conclude by mentioning reads (and not writes at all). > > How about: > > /* > * For COW inodes, we can only perform out of place writes of entire > * file allocation units (clusters). For a sub-cluster directio write, > * we must fall back to buffered I/O to perform the RMW. At best this > * is highly inefficient; at worst it leads to page cache invalidation > * races. Tell applications to avoid this by reporting separately the > * read and (larger) write alignments. yeah, that ending is better, but maybe Christoph wants to keep the more wordy original previous paragraphs. > */ > > --D >
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 6b0228a21617..053d05f5567d 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -580,9 +580,27 @@ xfs_report_dioalign( struct xfs_buftarg *target = xfs_inode_buftarg(ip); struct block_device *bdev = target->bt_bdev; - stat->result_mask |= STATX_DIOALIGN; + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN; stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; - stat->dio_offset_align = bdev_logical_block_size(bdev); + stat->dio_read_offset_align = bdev_logical_block_size(bdev); + + /* + * On COW inodes we are forced to always rewrite an entire file system + * block or RT extent. + * + * Because applications assume they can do sector sized direct writes + * on XFS we fall back to buffered I/O for sub-block direct I/O in that + * case. Because that needs to copy the entire block into the buffer + * cache it is highly inefficient and can easily lead to page cache + * invalidation races. + * + * Tell applications to avoid this case by reporting the natively + * supported direct I/O read alignment. + */ + if (xfs_is_cow_inode(ip)) + stat->dio_offset_align = xfs_inode_alloc_unitsize(ip); + else + stat->dio_offset_align = stat->dio_read_offset_align; } static void @@ -658,7 +676,7 @@ xfs_vn_getattr( stat->rdev = inode->i_rdev; break; case S_IFREG: - if (request_mask & STATX_DIOALIGN) + if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) xfs_report_dioalign(ip, stat); if (request_mask & STATX_WRITE_ATOMIC) xfs_report_atomic_write(ip, stat);
For I/O to reflinked blocks we always need to write an entire new file system block, and the code enforces the file system block alignment for the entire file if it has any reflinked blocks. Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read vs write alignments for reflinked files. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)