diff mbox series

[07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof

Message ID 20240910043949.3481298-8-hch@lst.de (mailing list archive)
State Not Applicable, archived
Headers show
Series [01/12] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release | expand

Commit Message

Christoph Hellwig Sept. 10, 2024, 4:39 a.m. UTC
xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
not take XFS_MMAPLOCK_EXCL (aka the invalidate lock).  Currently that
is acrually the right thing, as an error in the iomap zeroing code will
also take the invalidate_lock to clean up, but to fix that deadlock we
need a consistent locking pattern first.

The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
pagefaults, which isn't really needed here, but also not actively
harmful.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 8 +++++++-
 fs/xfs/xfs_iomap.c | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Sept. 17, 2024, 9:24 p.m. UTC | #1
On Tue, Sep 10, 2024 at 07:39:09AM +0300, Christoph Hellwig wrote:
> xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
> not take XFS_MMAPLOCK_EXCL (aka the invalidate lock).  Currently that
> is acrually the right thing, as an error in the iomap zeroing code will

     actually

> also take the invalidate_lock to clean up, but to fix that deadlock we
> need a consistent locking pattern first.
> 
> The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
> pagefaults, which isn't really needed here, but also not actively
> harmful.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c  | 8 +++++++-
>  fs/xfs/xfs_iomap.c | 2 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a30fda1985e6af..37dc26f51ace65 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -357,6 +357,7 @@ xfs_file_write_zero_eof(
>  {
>  	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
>  	loff_t			isize;
> +	int			error;
>  
>  	/*
>  	 * We need to serialise against EOF updates that occur in IO completions
> @@ -404,7 +405,12 @@ xfs_file_write_zero_eof(
>  	}
>  
>  	trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
> -	return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
> +
> +	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> +	error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
> +	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);

Ah, ok, so we're taking the invalidate_lock so that we can't have page
faults that might add folios (or dirty existing ones) in the mapping.
We're the only ones who can access the page cache, and we're doing that
so that we can zero the folios between the old EOF and the start of the
write region.

Is that right?  Then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +
> +	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 1e11f48814c0d0..3c98d82c0ad0dc 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1435,6 +1435,8 @@ xfs_zero_range(
>  {
>  	struct inode		*inode = VFS_I(ip);
>  
> +	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +
>  	if (IS_DAX(inode))
>  		return dax_zero_range(inode, pos, len, did_zero,
>  				      &xfs_dax_write_iomap_ops);
> -- 
> 2.45.2
> 
>
Christoph Hellwig Sept. 18, 2024, 5:10 a.m. UTC | #2
On Tue, Sep 17, 2024 at 02:24:27PM -0700, Darrick J. Wong wrote:
> Ah, ok, so we're taking the invalidate_lock so that we can't have page
> faults that might add folios (or dirty existing ones) in the mapping.
> We're the only ones who can access the page cache, and we're doing that
> so that we can zero the folios between the old EOF and the start of the
> write region.
> 
> Is that right?  Then

Yes.

We might eventually also relax this a bit to only take the look if
we actually zero anything, but for that we need to do the work to
turn the iomap iterators inside out first.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a30fda1985e6af..37dc26f51ace65 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -357,6 +357,7 @@  xfs_file_write_zero_eof(
 {
 	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
 	loff_t			isize;
+	int			error;
 
 	/*
 	 * We need to serialise against EOF updates that occur in IO completions
@@ -404,7 +405,12 @@  xfs_file_write_zero_eof(
 	}
 
 	trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
-	return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+
+	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+	error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+
+	return error;
 }
 
 /*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 1e11f48814c0d0..3c98d82c0ad0dc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1435,6 +1435,8 @@  xfs_zero_range(
 {
 	struct inode		*inode = VFS_I(ip);
 
+	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
 	if (IS_DAX(inode))
 		return dax_zero_range(inode, pos, len, did_zero,
 				      &xfs_dax_write_iomap_ops);