diff mbox series

[v2,1/5] xfs: eof trim writeback mapping as soon as it is cached

Message ID 20190117192004.49346-2-bfoster@redhat.com (mailing list archive)
State Superseded
Headers show
Series xfs: properly invalidate cached writeback mapping | expand

Commit Message

Brian Foster Jan. 17, 2019, 7:20 p.m. UTC
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Allison Henderson Jan. 18, 2019, 5:29 a.m. UTC | #1
On 1/17/19 12:20 PM, Brian Foster wrote:
> The cached writeback mapping is EOF trimmed to try and avoid races
> between post-eof block management and writeback that result in
> sending cached data to a stale location. The cached mapping is
> currently trimmed on the validation check, which leaves a race
> window between the time the mapping is cached and when it is trimmed
> against the current inode size.
> 
> For example, if a new mapping is cached by delalloc conversion on a
> blocksize == page size fs, we could cycle various locks, perform
> memory allocations, etc.  in the writeback codepath before the
> associated mapping is eventually trimmed to i_size. This leaves
> enough time for a post-eof truncate and file append before the
> cached mapping is trimmed. The former event essentially invalidates
> a range of the cached mapping and the latter bumps the inode size
> such the trim on the next writepage event won't trim all of the
> invalid blocks. fstest generic/464 reproduces this scenario
> occasionally and causes a lost writeback and stale delalloc blocks
> warning on inode inactivation.
> 
> To work around this problem, trim the cached writeback mapping as
> soon as it is cached in addition to on subsequent validation checks.
> This is a minor tweak to tighten the race window as much as possible
> until a proper invalidation mechanism is available.
> 
> Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof")
> Cc: <stable@vger.kernel.org> # v4.14+
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>   fs/xfs/xfs_aops.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 338b9d9984e0..d9048bcea49c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -449,6 +449,7 @@ xfs_map_blocks(
>   	}
>   
>   	wpc->imap = imap;
> +	xfs_trim_extent_eof(&wpc->imap, ip);
>   	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
>   	return 0;
>   allocate_blocks:
> @@ -459,6 +460,7 @@ xfs_map_blocks(
>   	ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
>   	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
>   	wpc->imap = imap;
> +	xfs_trim_extent_eof(&wpc->imap, ip);
>   	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
>   	return 0;
>   }
> 

Looks ok, you can add my review:
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Christoph Hellwig Jan. 18, 2019, 11:47 a.m. UTC | #2
On Thu, Jan 17, 2019 at 02:20:00PM -0500, Brian Foster wrote:
> The cached writeback mapping is EOF trimmed to try and avoid races
> between post-eof block management and writeback that result in
> sending cached data to a stale location. The cached mapping is
> currently trimmed on the validation check, which leaves a race
> window between the time the mapping is cached and when it is trimmed
> against the current inode size.
> 
> For example, if a new mapping is cached by delalloc conversion on a
> blocksize == page size fs, we could cycle various locks, perform
> memory allocations, etc.  in the writeback codepath before the
> associated mapping is eventually trimmed to i_size. This leaves
> enough time for a post-eof truncate and file append before the
> cached mapping is trimmed. The former event essentially invalidates
> a range of the cached mapping and the latter bumps the inode size
> such the trim on the next writepage event won't trim all of the
> invalid blocks. fstest generic/464 reproduces this scenario
> occasionally and causes a lost writeback and stale delalloc blocks
> warning on inode inactivation.
> 
> To work around this problem, trim the cached writeback mapping as
> soon as it is cached in addition to on subsequent validation checks.
> This is a minor tweak to tighten the race window as much as possible
> until a proper invalidation mechanism is available.
> 
> Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof")

I don't think it fixes that commit, but rather fixes more aspects of
the issue that commit tried to fix.

Otherwise this looks fine as a band-aid fix:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 338b9d9984e0..d9048bcea49c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -449,6 +449,7 @@  xfs_map_blocks(
 	}
 
 	wpc->imap = imap;
+	xfs_trim_extent_eof(&wpc->imap, ip);
 	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
 	return 0;
 allocate_blocks:
@@ -459,6 +460,7 @@  xfs_map_blocks(
 	ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF ||
 	       imap.br_startoff + imap.br_blockcount <= cow_fsb);
 	wpc->imap = imap;
+	xfs_trim_extent_eof(&wpc->imap, ip);
 	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
 	return 0;
 }