diff mbox series

[10/10] xfs: retry COW fork delalloc conversion when no

Message ID 20190215144725.8894-11-hch@lst.de (mailing list archive)
State Accepted, archived
Headers show
Series [01/10] xfs: remove the io_type field from the writeback context and ioend | expand

Commit Message

Christoph Hellwig Feb. 15, 2019, 2:47 p.m. UTC
While we can only truncate a block under the page lock for the current
page, there is no high-level synchronization for moving extents from the
COW to the data fork.  This means that for example we can have another
thread doing a direct I/O completion that moves extents from the COW to
the data fork race with writeback.  While this race is very hard to hit
the always_cow seems to reproduce it reasonably well, and it also exists
without that.  Because of that there is a chance that a delalloc
conversion for the COW fork might not find any extents to convert.  In
that case we should retry the whole block lookup and now find the blocks
in the data fork.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Darrick J. Wong Feb. 15, 2019, 11:32 p.m. UTC | #1
On Fri, Feb 15, 2019 at 03:47:25PM +0100, Christoph Hellwig wrote:
> While we can only truncate a block under the page lock for the current
> page, there is no high-level synchronization for moving extents from the
> COW to the data fork.  This means that for example we can have another
> thread doing a direct I/O completion that moves extents from the COW to
> the data fork race with writeback.  While this race is very hard to hit
> the always_cow seems to reproduce it reasonably well, and it also exists
> without that.  Because of that there is a chance that a delalloc
> conversion for the COW fork might not find any extents to convert.  In
> that case we should retry the whole block lookup and now find the blocks
> in the data fork.

<thinking aloud mode>

I /think/ the way that this series (+ Brian's before that) solve the
truncate/writeback race is that we now only convert existing delalloc
reservations to real extents when we're preparing to do writeback;
_writepage_map only cares about the mapping of the offset_fsb that it
happens to be looping right now (because the page lock serializes with
page cache truncate/punch); and we use the new sequence counters for
both the data and cow forks to decide when our cached mapping might be
invalid and therefore we need to get a new mapping?

Therefore we don't need to keep calling _trim_extent_eof or checking
offset against i_size or any of those other games because writeback
won't go allocating blocks into holes that are being punched and now we
have an explicit mechanism to invalidate wpc->imap instead of the
scattered detection code we had before?

If that's true, then:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index a6abb7125203..2ed8733eca49 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -334,7 +334,8 @@ xfs_imap_valid(
>   * extent that maps offset_fsb in wpc->imap.
>   *
>   * The current page is held locked so nothing could have removed the block
> - * backing offset_fsb.
> + * backing offset_fsb, although it could have moved from the COW to the data
> + * fork by another thread.
>   */
>  static int
>  xfs_convert_blocks(
> @@ -375,6 +376,7 @@ xfs_map_blocks(
>  	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>  	struct xfs_bmbt_irec	imap;
>  	struct xfs_iext_cursor	icur;
> +	int			retries = 0;
>  	int			error = 0;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
> @@ -404,6 +406,7 @@ xfs_map_blocks(
>  	 * into real extents.  If we return without a valid map, it means we
>  	 * landed in a hole and we skip the block.
>  	 */
> +retry:
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
> @@ -471,8 +474,19 @@ xfs_map_blocks(
>  	return 0;
>  allocate_blocks:
>  	error = xfs_convert_blocks(wpc, ip, offset_fsb);
> -	if (error)
> +	if (error) {
> +		/*
> +		 * If we failed to find the extent in the COW fork we might have
> +		 * raced with a COW to data fork conversion or truncate.
> +		 * Restart the lookup to catch the extent in the data fork for
> +		 * the former case, but prevent additional retries to avoid
> +		 * looping forever for the latter case.
> +		 */
> +		if (error == -EAGAIN && wpc->fork == XFS_COW_FORK && !retries++)
> +			goto retry;
> +		ASSERT(error != -EAGAIN);
>  		return error;
> +	}
>  
>  	/*
>  	 * Due to merging the return real extent might be larger than the
> -- 
> 2.20.1
>
Christoph Hellwig Feb. 18, 2019, 9:09 a.m. UTC | #2
On Fri, Feb 15, 2019 at 03:32:25PM -0800, Darrick J. Wong wrote:
> I /think/ the way that this series (+ Brian's before that) solve the
> truncate/writeback race is that we now only convert existing delalloc
> reservations to real extents when we're preparing to do writeback;
> _writepage_map only cares about the mapping of the offset_fsb that it
> happens to be looping right now (because the page lock serializes with
> page cache truncate/punch); and we use the new sequence counters for
> both the data and cow forks to decide when our cached mapping might be
> invalid and therefore we need to get a new mapping?

Well, we already tried to do that before, we just weren't all that good
at it.  The big difference is that the delalloc conversion now doesn't
blindly reuse the range looked up a long time before under a different
ilock critical section, but instead just uses that as a hint and
only converts the extent that the writeback offset falls into, and
only does so if it actually still is in delalloc state.
Darrick J. Wong Feb. 19, 2019, 5:19 a.m. UTC | #3
On Mon, Feb 18, 2019 at 10:09:42AM +0100, Christoph Hellwig wrote:
> On Fri, Feb 15, 2019 at 03:32:25PM -0800, Darrick J. Wong wrote:
> > I /think/ the way that this series (+ Brian's before that) solve the
> > truncate/writeback race is that we now only convert existing delalloc
> > reservations to real extents when we're preparing to do writeback;
> > _writepage_map only cares about the mapping of the offset_fsb that it
> > happens to be looping right now (because the page lock serializes with
> > page cache truncate/punch); and we use the new sequence counters for
> > both the data and cow forks to decide when our cached mapping might be
> > invalid and therefore we need to get a new mapping?
> 
> Well, we already tried to do that before, we just weren't all that good
> at it.  The big difference is that the delalloc conversion now doesn't
> blindly reuse the range looked up a long time before under a different
> ilock critical section, but instead just uses that as a hint and
> only converts the extent that the writeback offset falls into, and
> only does so if it actually still is in delalloc state.

Got it.  I'll pull this in if I don't hear any loud yelling. :)

(FWIW it tested ok over the weekend.)

--D
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a6abb7125203..2ed8733eca49 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -334,7 +334,8 @@  xfs_imap_valid(
  * extent that maps offset_fsb in wpc->imap.
  *
  * The current page is held locked so nothing could have removed the block
- * backing offset_fsb.
+ * backing offset_fsb, although it could have moved from the COW to the data
+ * fork by another thread.
  */
 static int
 xfs_convert_blocks(
@@ -375,6 +376,7 @@  xfs_map_blocks(
 	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
 	struct xfs_bmbt_irec	imap;
 	struct xfs_iext_cursor	icur;
+	int			retries = 0;
 	int			error = 0;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -404,6 +406,7 @@  xfs_map_blocks(
 	 * into real extents.  If we return without a valid map, it means we
 	 * landed in a hole and we skip the block.
 	 */
+retry:
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
@@ -471,8 +474,19 @@  xfs_map_blocks(
 	return 0;
 allocate_blocks:
 	error = xfs_convert_blocks(wpc, ip, offset_fsb);
-	if (error)
+	if (error) {
+		/*
+		 * If we failed to find the extent in the COW fork we might have
+		 * raced with a COW to data fork conversion or truncate.
+		 * Restart the lookup to catch the extent in the data fork for
+		 * the former case, but prevent additional retries to avoid
+		 * looping forever for the latter case.
+		 */
+		if (error == -EAGAIN && wpc->fork == XFS_COW_FORK && !retries++)
+			goto retry;
+		ASSERT(error != -EAGAIN);
 		return error;
+	}
 
 	/*
 	 * Due to merging the return real extent might be larger than the