diff mbox series

[04/11] xfs: don't try to map blocks beyond i_size in writeback

Message ID 20190131075524.4769-5-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/11] FOLD: improve xfs_bmapi_delalloc | expand

Commit Message

Christoph Hellwig Jan. 31, 2019, 7:55 a.m. UTC
We already shortcut xfs_map_blocks for COW mappings, but there is just
as little reason to start writeback beyond i_size in the data fork.

Note that this has to be just an optimization as hole punches can unmaps
block just like truncate, and we need to handle that case further down
in the low-level code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

Comments

Brian Foster Jan. 31, 2019, 6:11 p.m. UTC | #1
On Thu, Jan 31, 2019 at 08:55:17AM +0100, Christoph Hellwig wrote:
> We already shortcut xfs_map_blocks for COW mappings, but there is just
> as little reason to start writeback beyond i_size in the data fork.
> 
> Note that this has to be just an optimization as hole punches can unmaps
> block just like truncate, and we need to handle that case further down
> in the low-level code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 34 ++++++++++++++--------------------
>  1 file changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8bfb62d8776f..9c2a1947d5dd 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -348,6 +348,20 @@ xfs_map_blocks(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> +	/*
> +	 * If the offset is beyond the inode size, we know that we raced with
> +	 * truncate.  No point in doing calling any lower level code, just
> +	 * return a hole so that the writeback code skips writeback for the
> +	 * rest of the file.
> +	 */
> +	if (offset > i_size_read(inode)) {
> +		wpc->imap.br_startoff = offset_fsb;
> +		wpc->imap.br_blockcount = end_fsb - offset_fsb;
> +		wpc->imap.br_startblock = HOLESTARTBLOCK;
> +		wpc->imap.br_state = XFS_EXT_NORM;
> +		return 0;
> +	}
> +

The code looks fine, but I don't see any more value in this code than
the similar code down in xfs_iomap_write_allocate(). The comment implies
this skips writeback for the rest of the file, but AFACT the higher
level page->index code in xfs_do_writepage() already does that. All
these checks do is skip the remaining blocks in the current page. When
you consider that we're most likely sending an I/O in the latter case
either way, I'm curious why we'd bother to keep this around at all.

Brian

>  	/*
>  	 * COW fork blocks can overlap data fork blocks even if the blocks
>  	 * aren't shared.  COW I/O always takes precedent, so we must always
> @@ -388,26 +402,6 @@ xfs_map_blocks(
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  		wpc->fork = XFS_COW_FORK;
> -
> -		/*
> -		 * Truncate can race with writeback since writeback doesn't
> -		 * take the iolock and truncate decreases the file size before
> -		 * it starts truncating the pages between new_size and old_size.
> -		 * Therefore, we can end up in the situation where writeback
> -		 * gets a CoW fork mapping but the truncate makes the mapping
> -		 * invalid and we end up in here trying to get a new mapping.
> -		 * bail out here so that we simply never get a valid mapping
> -		 * and so we drop the write altogether.  The page truncation
> -		 * will kill the contents anyway.
> -		 */
> -		if (offset > i_size_read(inode)) {
> -			wpc->imap.br_blockcount = end_fsb - offset_fsb;
> -			wpc->imap.br_startoff = offset_fsb;
> -			wpc->imap.br_startblock = HOLESTARTBLOCK;
> -			wpc->imap.br_state = XFS_EXT_NORM;
> -			return 0;
> -		}
> -
>  		goto allocate_blocks;
>  	}
>  
> -- 
> 2.20.1
>
Christoph Hellwig Feb. 1, 2019, 7:25 a.m. UTC | #2
On Thu, Jan 31, 2019 at 01:11:09PM -0500, Brian Foster wrote:
> The code looks fine, but I don't see any more value in this code than
> the similar code down in xfs_iomap_write_allocate(). The comment implies
> this skips writeback for the rest of the file, but AFACT the higher
> level page->index code in xfs_do_writepage() already does that. All
> these checks do is skip the remaining blocks in the current page. When
> you consider that we're most likely sending an I/O in the latter case
> either way, I'm curious why we'd bother to keep this around at all.

It just seems a little pointless to take locks and do a lookup in the
extent tree.  But I can try dropping it and re-run tests without it.
Brian Foster Feb. 1, 2019, 12:46 p.m. UTC | #3
On Fri, Feb 01, 2019 at 08:25:20AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 31, 2019 at 01:11:09PM -0500, Brian Foster wrote:
> > The code looks fine, but I don't see any more value in this code than
> > the similar code down in xfs_iomap_write_allocate(). The comment implies
> > this skips writeback for the rest of the file, but AFACT the higher
> > level page->index code in xfs_do_writepage() already does that. All
> > these checks do is skip the remaining blocks in the current page. When
> > you consider that we're most likely sending an I/O in the latter case
> > either way, I'm curious why we'd bother to keep this around at all.
> 
> It just seems a little pointless to take locks and do a lookup in the
> extent tree.  But I can try dropping it and re-run tests without it.

I agree that is pointless, but what's the point of saving lock cycles
and an in-core extent lookup when we've already committed to sending an
I/O? AFAICT we may also still have to cycle through however many pages
are still referenced by the current pagevec. The racing truncate is
going to wait on the higher level page lock and page writeback state
before it ever gets to the point of contending on ilock.

More broadly, this is a codepath that is historically brittle with
regard to uncommon branches in the code, whether that be incorrect error
handling (page discard), the similar truncate detection code in
xfs_iomap_write_allocate() that clearly had broken -EAGAIN handling code
for who knows how long, etc. IMO, this all stems from lack of good test
coverage in the area of writeback failure.

Unless I'm missing something where this code is required for correctness
or somehow provides a measureable performance benefit, I think we're
better off keeping this path as simple as possible. It should be fairly
easy to provide test coverage for the page granularity truncate handling
code (generic/524 may already cover this). I'm not so sure about that
for the block granularity handling without having some kind of
additional instrumentation.

Brian
Christoph Hellwig Feb. 1, 2019, 4:07 p.m. UTC | #4
On Fri, Feb 01, 2019 at 07:46:42AM -0500, Brian Foster wrote:
> Unless I'm missing something where this code is required for correctness
> or somehow provides a measureable performance benefit, I think we're
> better off keeping this path as simple as possible. It should be fairly
> easy to provide test coverage for the page granularity truncate handling
> code (generic/524 may already cover this). I'm not so sure about that
> for the block granularity handling without having some kind of
> additional instrumentation.

It should not be required for correctness.  I've done a first round
of testing with the check removed and it seems to be doing fine.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8bfb62d8776f..9c2a1947d5dd 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -348,6 +348,20 @@  xfs_map_blocks(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/*
+	 * If the offset is beyond the inode size, we know that we raced with
+	 * truncate.  No point in doing calling any lower level code, just
+	 * return a hole so that the writeback code skips writeback for the
+	 * rest of the file.
+	 */
+	if (offset > i_size_read(inode)) {
+		wpc->imap.br_startoff = offset_fsb;
+		wpc->imap.br_blockcount = end_fsb - offset_fsb;
+		wpc->imap.br_startblock = HOLESTARTBLOCK;
+		wpc->imap.br_state = XFS_EXT_NORM;
+		return 0;
+	}
+
 	/*
 	 * COW fork blocks can overlap data fork blocks even if the blocks
 	 * aren't shared.  COW I/O always takes precedent, so we must always
@@ -388,26 +402,6 @@  xfs_map_blocks(
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 		wpc->fork = XFS_COW_FORK;
-
-		/*
-		 * Truncate can race with writeback since writeback doesn't
-		 * take the iolock and truncate decreases the file size before
-		 * it starts truncating the pages between new_size and old_size.
-		 * Therefore, we can end up in the situation where writeback
-		 * gets a CoW fork mapping but the truncate makes the mapping
-		 * invalid and we end up in here trying to get a new mapping.
-		 * bail out here so that we simply never get a valid mapping
-		 * and so we drop the write altogether.  The page truncation
-		 * will kill the contents anyway.
-		 */
-		if (offset > i_size_read(inode)) {
-			wpc->imap.br_blockcount = end_fsb - offset_fsb;
-			wpc->imap.br_startoff = offset_fsb;
-			wpc->imap.br_startblock = HOLESTARTBLOCK;
-			wpc->imap.br_state = XFS_EXT_NORM;
-			return 0;
-		}
-
 		goto allocate_blocks;
 	}