diff mbox series

[06/10] xfs: don't free post-EOF blocks on read close

Message ID 20240623053532.857496-7-hch@lst.de (mailing list archive)
State Superseded, archived
Headers show
Series [01/10] xfs: fix freeing speculative preallocations for preallocated files | expand

Commit Message

Christoph Hellwig June 23, 2024, 5:34 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When we have a workload that does open/read/close in parallel with other
allocation, the file becomes rapidly fragmented. This is due to close()
calling xfs_file_release() and removing the speculative preallocation
beyond EOF.

Add a check for a writable context to xfs_file_release to skip the
post-EOF block freeing (an the similarly pointless flushing on truncate
down).

Before:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 919
/mnt/scratch/file.1: 916
/mnt/scratch/file.2: 919
/mnt/scratch/file.3: 920
/mnt/scratch/file.4: 920
/mnt/scratch/file.5: 921
/mnt/scratch/file.6: 916
/mnt/scratch/file.7: 918

After:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 24
/mnt/scratch/file.1: 24
/mnt/scratch/file.2: 11
/mnt/scratch/file.3: 24
/mnt/scratch/file.4: 3
/mnt/scratch/file.5: 24
/mnt/scratch/file.6: 24
/mnt/scratch/file.7: 23

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: wordsmithing, fix commit message]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: ported to the new ->release code structure]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong June 24, 2024, 3:43 p.m. UTC | #1
On Sun, Jun 23, 2024 at 07:34:51AM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we have a workload that does open/read/close in parallel with other
> allocation, the file becomes rapidly fragmented. This is due to close()
> calling xfs_file_release() and removing the speculative preallocation
> beyond EOF.
> 
> Add a check for a writable context to xfs_file_release to skip the
> post-EOF block freeing (an the similarly pointless flushing on truncate
> down).
> 
> Before:
> 
> Test 1: sync write fragmentation counts
> 
> /mnt/scratch/file.0: 919
> /mnt/scratch/file.1: 916
> /mnt/scratch/file.2: 919
> /mnt/scratch/file.3: 920
> /mnt/scratch/file.4: 920
> /mnt/scratch/file.5: 921
> /mnt/scratch/file.6: 916
> /mnt/scratch/file.7: 918
> 
> After:
> 
> Test 1: sync write fragmentation counts
> 
> /mnt/scratch/file.0: 24
> /mnt/scratch/file.1: 24
> /mnt/scratch/file.2: 11
> /mnt/scratch/file.3: 24
> /mnt/scratch/file.4: 3
> /mnt/scratch/file.5: 24
> /mnt/scratch/file.6: 24
> /mnt/scratch/file.7: 23
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [darrick: wordsmithing, fix commit message]
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> [hch: ported to the new ->release code structure]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I like how this has gotten much pared down from what's been lurking in
my tree for ages.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_file.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0380e0b1d9c6c7..8d70171678fe24 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1228,12 +1228,18 @@ xfs_file_release(
>  	 * There is no point in freeing blocks here for open but unlinked files
>  	 * as they will be taken care of by the inactivation path soon.
>  	 *
> +	 * When releasing a read-only context, don't flush data or trim post-EOF
> +	 * blocks.  This avoids open/read/close workloads from removing EOF
> +	 * blocks that other writers depend upon to reduce fragmentation.
> +	 *
>  	 * If we can't get the iolock just skip truncating the blocks past EOF
>  	 * because we could deadlock with the mmap_lock otherwise. We'll get
>  	 * another chance to drop them once the last reference to the inode is
>  	 * dropped, so we'll never leak blocks permanently.
>  	 */
> -	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> +	if (inode->i_nlink &&
> +	    (file->f_mode & FMODE_WRITE) &&
> +	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
>  		if (xfs_can_free_eofblocks(ip) &&
>  		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
>  			/*
> -- 
> 2.43.0
> 
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0380e0b1d9c6c7..8d70171678fe24 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1228,12 +1228,18 @@  xfs_file_release(
 	 * There is no point in freeing blocks here for open but unlinked files
 	 * as they will be taken care of by the inactivation path soon.
 	 *
+	 * When releasing a read-only context, don't flush data or trim post-EOF
+	 * blocks.  This avoids open/read/close workloads from removing EOF
+	 * blocks that other writers depend upon to reduce fragmentation.
+	 *
 	 * If we can't get the iolock just skip truncating the blocks past EOF
 	 * because we could deadlock with the mmap_lock otherwise. We'll get
 	 * another chance to drop them once the last reference to the inode is
 	 * dropped, so we'll never leak blocks permanently.
 	 */
-	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+	if (inode->i_nlink &&
+	    (file->f_mode & FMODE_WRITE) &&
+	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 		if (xfs_can_free_eofblocks(ip) &&
 		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
 			/*