Message ID | 20190215144725.8894-11-hch@lst.de (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | [01/10] xfs: remove the io_type field from the writeback context and ioend | expand |
On Fri, Feb 15, 2019 at 03:47:25PM +0100, Christoph Hellwig wrote: > While we can only truncate a block under the page lock for the current > page, there is no high-level synchronization for moving extents from the > COW to the data fork. This means that for example we can have another > thread doing a direct I/O completion that moves extents from the COW to > the data fork race with writeback. While this race is very hard to hit > the always_cow seems to reproduce it reasonably well, and it also exists > without that. Because of that there is a chance that a delalloc > conversion for the COW fork might not find any extents to convert. In > that case we should retry the whole block lookup and now find the blocks > in the data fork. <thinking aloud mode> I /think/ the way that this series (+ Brian's before that) solve the truncate/writeback race is that we now only convert existing delalloc reservations to real extents when we're preparing to do writeback; _writepage_map only cares about the mapping of the offset_fsb that it happens to be looping right now (because the page lock serializes with page cache truncate/punch); and we use the new sequence counters for both the data and cow forks to decide when our cached mapping might be invalid and therefore we need to get a new mapping? Therefore we don't need to keep calling _trim_extent_eof or checking offset against i_size or any of those other games because writeback won't go allocating blocks into holes that are being punched and now we have an explicit mechanism to invalidate wpc->imap instead of the scattered detection code we had before? If that's true, then: Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_aops.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index a6abb7125203..2ed8733eca49 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -334,7 +334,8 @@ xfs_imap_valid( > * extent that maps offset_fsb in wpc->imap. > * > * The current page is held locked so nothing could have removed the block > - * backing offset_fsb. > + * backing offset_fsb, although it could have moved from the COW to the data > + * fork by another thread. > */ > static int > xfs_convert_blocks( > @@ -375,6 +376,7 @@ xfs_map_blocks( > xfs_fileoff_t cow_fsb = NULLFILEOFF; > struct xfs_bmbt_irec imap; > struct xfs_iext_cursor icur; > + int retries = 0; > int error = 0; > > if (XFS_FORCED_SHUTDOWN(mp)) > @@ -404,6 +406,7 @@ xfs_map_blocks( > * into real extents. If we return without a valid map, it means we > * landed in a hole and we skip the block. > */ > +retry: > xfs_ilock(ip, XFS_ILOCK_SHARED); > ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE || > (ip->i_df.if_flags & XFS_IFEXTENTS)); > @@ -471,8 +474,19 @@ xfs_map_blocks( > return 0; > allocate_blocks: > error = xfs_convert_blocks(wpc, ip, offset_fsb); > - if (error) > + if (error) { > + /* > + * If we failed to find the extent in the COW fork we might have > + * raced with a COW to data fork conversion or truncate. > + * Restart the lookup to catch the extent in the data fork for > + * the former case, but prevent additional retries to avoid > + * looping forever for the latter case. > + */ > + if (error == -EAGAIN && wpc->fork == XFS_COW_FORK && !retries++) > + goto retry; > + ASSERT(error != -EAGAIN); > return error; > + } > > /* > * Due to merging the return real extent might be larger than the > -- > 2.20.1 >
On Fri, Feb 15, 2019 at 03:32:25PM -0800, Darrick J. Wong wrote: > I /think/ the way that this series (+ Brian's before that) solve the > truncate/writeback race is that we now only convert existing delalloc > reservations to real extents when we're preparing to do writeback; > _writepage_map only cares about the mapping of the offset_fsb that it > happens to be looping right now (because the page lock serializes with > page cache truncate/punch); and we use the new sequence counters for > both the data and cow forks to decide when our cached mapping might be > invalid and therefore we need to get a new mapping? Well, we already tried to do that before, we just weren't all that good at it. The big difference is that the delalloc conversion now doesn't blindly reuse the range looked up a long time before under a different ilock critical section, but instead just uses that as a hint and only converts the extent that the writeback offset falls into, and only does so if it actually still is in delalloc state.
On Mon, Feb 18, 2019 at 10:09:42AM +0100, Christoph Hellwig wrote: > On Fri, Feb 15, 2019 at 03:32:25PM -0800, Darrick J. Wong wrote: > > I /think/ the way that this series (+ Brian's before that) solve the > > truncate/writeback race is that we now only convert existing delalloc > > reservations to real extents when we're preparing to do writeback; > > _writepage_map only cares about the mapping of the offset_fsb that it > > happens to be looping right now (because the page lock serializes with > > page cache truncate/punch); and we use the new sequence counters for > > both the data and cow forks to decide when our cached mapping might be > > invalid and therefore we need to get a new mapping? > > Well, we already tried to do that before, we just weren't all that good > at it. The big difference is that the delalloc conversion now doesn't > blindly reuse the range looked up a long time before under a different > ilock critical section, but instead just uses that as a hint and > only converts the extent that the writeback offset falls into, and > only does so if it actually still is in delalloc state. Got it. I'll pull this in if I don't hear any loud yelling. :) (FWIW it tested ok over the weekend.) --D
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index a6abb7125203..2ed8733eca49 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -334,7 +334,8 @@ xfs_imap_valid( * extent that maps offset_fsb in wpc->imap. * * The current page is held locked so nothing could have removed the block - * backing offset_fsb. + * backing offset_fsb, although it could have moved from the COW to the data + * fork by another thread. */ static int xfs_convert_blocks( @@ -375,6 +376,7 @@ xfs_map_blocks( xfs_fileoff_t cow_fsb = NULLFILEOFF; struct xfs_bmbt_irec imap; struct xfs_iext_cursor icur; + int retries = 0; int error = 0; if (XFS_FORCED_SHUTDOWN(mp)) @@ -404,6 +406,7 @@ xfs_map_blocks( * into real extents. If we return without a valid map, it means we * landed in a hole and we skip the block. */ +retry: xfs_ilock(ip, XFS_ILOCK_SHARED); ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE || (ip->i_df.if_flags & XFS_IFEXTENTS)); @@ -471,8 +474,19 @@ xfs_map_blocks( return 0; allocate_blocks: error = xfs_convert_blocks(wpc, ip, offset_fsb); - if (error) + if (error) { + /* + * If we failed to find the extent in the COW fork we might have + * raced with a COW to data fork conversion or truncate. + * Restart the lookup to catch the extent in the data fork for + * the former case, but prevent additional retries to avoid + * looping forever for the latter case. + */ + if (error == -EAGAIN && wpc->fork == XFS_COW_FORK && !retries++) + goto retry; + ASSERT(error != -EAGAIN); return error; + } /* * Due to merging the return real extent might be larger than the
While we can only truncate a block under the page lock for the current page, there is no high-level synchronization for moving extents from the COW to the data fork. This means that for example we can have another thread doing a direct I/O completion that moves extents from the COW to the data fork race with writeback. While this race is very hard to hit the always_cow seems to reproduce it reasonably well, and it also exists without that. Because of that there is a chance that a delalloc conversion for the COW fork might not find any extents to convert. In that case we should retry the whole block lookup and now find the blocks in the data fork. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_aops.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)