[07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof

Message ID	20240910043949.3481298-8-hch@lst.de (mailing list archive)
State	New
Headers	show Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 672F320314; Tue, 10 Sep 2024 04:40:27 +0000 (UTC) From: Christoph Hellwig <hch@lst.de> To: Chandan Babu R <chandan.babu@oracle.com>, Christian Brauner <brauner@kernel.org>, "Darrick J. Wong" <djwong@kernel.org> Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof Date: Tue, 10 Sep 2024 07:39:09 +0300 Message-ID: <20240910043949.3481298-8-hch@lst.de> In-Reply-To: <20240910043949.3481298-1-hch@lst.de> References: <20240910043949.3481298-1-hch@lst.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[01/12] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release \| expand [01/12] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release [02/12] iomap: improve shared block detection in iomap_unshare_iter [03/12] iomap: pass flags to iomap_file_buffered_write_punch_delalloc [04/12] iomap: pass the iomap to the punch callback [05/12] iomap: remove the iomap_file_buffered_write_punch_delalloc return value [06/12] xfs: factor out a xfs_file_write_zero_eof helper [07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof [08/12] iomap: zeroing already holds invalidate_lock [09/12] xfs: support the COW fork in xfs_bmap_punch_delalloc_range [10/12] xfs: share more code in xfs_buffered_write_iomap_begin [11/12] xfs: set IOMAP_F_SHARED for all COW fork allocations [12/12] xfs: punch delalloc extents from the COW fork for COW writes

Message ID

20240910043949.3481298-8-hch@lst.de (mailing list archive)

State

New

Headers

From: Christoph Hellwig <hch@lst.de>
To: Chandan Babu R <chandan.babu@oracle.com>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH 07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
Date: Tue, 10 Sep 2024 07:39:09 +0300
Message-ID: <20240910043949.3481298-8-hch@lst.de>
In-Reply-To: <20240910043949.3481298-1-hch@lst.de>
References: <20240910043949.3481298-1-hch@lst.de>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[01/12] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release | expand

Commit Message

Christoph Hellwig Sept. 10, 2024, 4:39 a.m. UTC

xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
not take XFS_MMAPLOCK_EXCL (aka the invalidate lock).  Currently that
is acrually the right thing, as an error in the iomap zeroing code will
also take the invalidate_lock to clean up, but to fix that deadlock we
need a consistent locking pattern first.

The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
pagefaults, which isn't really needed here, but also not actively
harmful.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 8 +++++++-
 fs/xfs/xfs_iomap.c | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Sept. 17, 2024, 9:24 p.m. UTC | #1

On Tue, Sep 10, 2024 at 07:39:09AM +0300, Christoph Hellwig wrote:
> xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
> not take XFS_MMAPLOCK_EXCL (aka the invalidate lock).  Currently that
> is acrually the right thing, as an error in the iomap zeroing code will

     actually

> also take the invalidate_lock to clean up, but to fix that deadlock we
> need a consistent locking pattern first.
> 
> The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
> pagefaults, which isn't really needed here, but also not actively
> harmful.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c  | 8 +++++++-
>  fs/xfs/xfs_iomap.c | 2 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a30fda1985e6af..37dc26f51ace65 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -357,6 +357,7 @@ xfs_file_write_zero_eof(
>  {
>  	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
>  	loff_t			isize;
> +	int			error;
>  
>  	/*
>  	 * We need to serialise against EOF updates that occur in IO completions
> @@ -404,7 +405,12 @@ xfs_file_write_zero_eof(
>  	}
>  
>  	trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
> -	return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
> +
> +	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> +	error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
> +	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);

Ah, ok, so we're taking the invalidate_lock so that we can't have page
faults that might add folios (or dirty existing ones) in the mapping.
We're the only ones who can access the page cache, and we're doing that
so that we can zero the folios between the old EOF and the start of the
write region.

Is that right?  Then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +
> +	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 1e11f48814c0d0..3c98d82c0ad0dc 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1435,6 +1435,8 @@ xfs_zero_range(
>  {
>  	struct inode		*inode = VFS_I(ip);
>  
> +	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
> +
>  	if (IS_DAX(inode))
>  		return dax_zero_range(inode, pos, len, did_zero,
>  				      &xfs_dax_write_iomap_ops);
> -- 
> 2.45.2
> 
>

Christoph Hellwig Sept. 18, 2024, 5:10 a.m. UTC | #2

On Tue, Sep 17, 2024 at 02:24:27PM -0700, Darrick J. Wong wrote:
> Ah, ok, so we're taking the invalidate_lock so that we can't have page
> faults that might add folios (or dirty existing ones) in the mapping.
> We're the only ones who can access the page cache, and we're doing that
> so that we can zero the folios between the old EOF and the start of the
> write region.
> 
> Is that right?  Then

Yes.

We might eventually also relax this a bit to only take the look if
we actually zero anything, but for that we need to do the work to
turn the iomap iterators inside out first.

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a30fda1985e6af..37dc26f51ace65 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -357,6 +357,7 @@  xfs_file_write_zero_eof(
 {
 	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
 	loff_t			isize;
+	int			error;
 
 	/*
 	 * We need to serialise against EOF updates that occur in IO completions
@@ -404,7 +405,12 @@  xfs_file_write_zero_eof(
 	}
 
 	trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
-	return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+
+	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+	error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+
+	return error;
 }
 
 /*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 1e11f48814c0d0..3c98d82c0ad0dc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1435,6 +1435,8 @@  xfs_zero_range(
 {
 	struct inode		*inode = VFS_I(ip);
 
+	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
 	if (IS_DAX(inode))
 		return dax_zero_range(inode, pos, len, did_zero,
 				      &xfs_dax_write_iomap_ops);

[07/12] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof

Commit Message

Comments

Patch