Message ID | 20190131075524.4769-5-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/11] FOLD: improve xfs_bmapi_delalloc | expand |
On Thu, Jan 31, 2019 at 08:55:17AM +0100, Christoph Hellwig wrote: > We already shortcut xfs_map_blocks for COW mappings, but there is just > as little reason to start writeback beyond i_size in the data fork. > > Note that this has to be just an optimization as hole punches can unmaps > block just like truncate, and we need to handle that case further down > in the low-level code. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_aops.c | 34 ++++++++++++++-------------------- > 1 file changed, 14 insertions(+), 20 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 8bfb62d8776f..9c2a1947d5dd 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -348,6 +348,20 @@ xfs_map_blocks( > if (XFS_FORCED_SHUTDOWN(mp)) > return -EIO; > > + /* > + * If the offset is beyond the inode size, we know that we raced with > + * truncate. No point in doing calling any lower level code, just > + * return a hole so that the writeback code skips writeback for the > + * rest of the file. > + */ > + if (offset > i_size_read(inode)) { > + wpc->imap.br_startoff = offset_fsb; > + wpc->imap.br_blockcount = end_fsb - offset_fsb; > + wpc->imap.br_startblock = HOLESTARTBLOCK; > + wpc->imap.br_state = XFS_EXT_NORM; > + return 0; > + } > + The code looks fine, but I don't see any more value in this code than the similar code down in xfs_iomap_write_allocate(). The comment implies this skips writeback for the rest of the file, but AFACT the higher level page->index code in xfs_do_writepage() already does that. All these checks do is skip the remaining blocks in the current page. When you consider that we're most likely sending an I/O in the latter case either way, I'm curious why we'd bother to keep this around at all. Brian > /* > * COW fork blocks can overlap data fork blocks even if the blocks > * aren't shared. COW I/O always takes precedent, so we must always > @@ -388,26 +402,6 @@ xfs_map_blocks( > xfs_iunlock(ip, XFS_ILOCK_SHARED); > > wpc->fork = XFS_COW_FORK; > - > - /* > - * Truncate can race with writeback since writeback doesn't > - * take the iolock and truncate decreases the file size before > - * it starts truncating the pages between new_size and old_size. > - * Therefore, we can end up in the situation where writeback > - * gets a CoW fork mapping but the truncate makes the mapping > - * invalid and we end up in here trying to get a new mapping. > - * bail out here so that we simply never get a valid mapping > - * and so we drop the write altogether. The page truncation > - * will kill the contents anyway. > - */ > - if (offset > i_size_read(inode)) { > - wpc->imap.br_blockcount = end_fsb - offset_fsb; > - wpc->imap.br_startoff = offset_fsb; > - wpc->imap.br_startblock = HOLESTARTBLOCK; > - wpc->imap.br_state = XFS_EXT_NORM; > - return 0; > - } > - > goto allocate_blocks; > } > > -- > 2.20.1 >
On Thu, Jan 31, 2019 at 01:11:09PM -0500, Brian Foster wrote: > The code looks fine, but I don't see any more value in this code than > the similar code down in xfs_iomap_write_allocate(). The comment implies > this skips writeback for the rest of the file, but AFACT the higher > level page->index code in xfs_do_writepage() already does that. All > these checks do is skip the remaining blocks in the current page. When > you consider that we're most likely sending an I/O in the latter case > either way, I'm curious why we'd bother to keep this around at all. It just seems a little pointless to take locks and do a lookup in the extent tree. But I can try dropping it and re-run tests without it.
On Fri, Feb 01, 2019 at 08:25:20AM +0100, Christoph Hellwig wrote: > On Thu, Jan 31, 2019 at 01:11:09PM -0500, Brian Foster wrote: > > The code looks fine, but I don't see any more value in this code than > > the similar code down in xfs_iomap_write_allocate(). The comment implies > > this skips writeback for the rest of the file, but AFACT the higher > > level page->index code in xfs_do_writepage() already does that. All > > these checks do is skip the remaining blocks in the current page. When > > you consider that we're most likely sending an I/O in the latter case > > either way, I'm curious why we'd bother to keep this around at all. > > It just seems a little pointless to take locks and do a lookup in the > extent tree. But I can try dropping it and re-run tests without it. I agree that is pointless, but what's the point of saving lock cycles and an in-core extent lookup when we've already committed to sending an I/O? AFAICT we may also still have to cycle through however many pages are still referenced by the current pagevec. The racing truncate is going to wait on the higher level page lock and page writeback state before it ever gets to the point of contending on ilock. More broadly, this is a codepath that is historically brittle with regard to uncommon branches in the code, whether that be incorrect error handling (page discard), the similar truncate detection code in xfs_iomap_write_allocate() that clearly had broken -EAGAIN handling code for who knows how long, etc. IMO, this all stems from lack of good test coverage in the area of writeback failure. Unless I'm missing something where this code is required for correctness or somehow provides a measureable performance benefit, I think we're better off keeping this path as simple as possible. It should be fairly easy to provide test coverage for the page granularity truncate handling code (generic/524 may already cover this). I'm not so sure about that for the block granularity handling without having some kind of additional instrumentation. Brian
On Fri, Feb 01, 2019 at 07:46:42AM -0500, Brian Foster wrote: > Unless I'm missing something where this code is required for correctness > or somehow provides a measureable performance benefit, I think we're > better off keeping this path as simple as possible. It should be fairly > easy to provide test coverage for the page granularity truncate handling > code (generic/524 may already cover this). I'm not so sure about that > for the block granularity handling without having some kind of > additional instrumentation. It should not be required for correctness. I've done a first round of testing with the check removed and it seems to be doing fine.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 8bfb62d8776f..9c2a1947d5dd 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -348,6 +348,20 @@ xfs_map_blocks( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; + /* + * If the offset is beyond the inode size, we know that we raced with + * truncate. No point in doing calling any lower level code, just + * return a hole so that the writeback code skips writeback for the + * rest of the file. + */ + if (offset > i_size_read(inode)) { + wpc->imap.br_startoff = offset_fsb; + wpc->imap.br_blockcount = end_fsb - offset_fsb; + wpc->imap.br_startblock = HOLESTARTBLOCK; + wpc->imap.br_state = XFS_EXT_NORM; + return 0; + } + /* * COW fork blocks can overlap data fork blocks even if the blocks * aren't shared. COW I/O always takes precedent, so we must always @@ -388,26 +402,6 @@ xfs_map_blocks( xfs_iunlock(ip, XFS_ILOCK_SHARED); wpc->fork = XFS_COW_FORK; - - /* - * Truncate can race with writeback since writeback doesn't - * take the iolock and truncate decreases the file size before - * it starts truncating the pages between new_size and old_size. - * Therefore, we can end up in the situation where writeback - * gets a CoW fork mapping but the truncate makes the mapping - * invalid and we end up in here trying to get a new mapping. - * bail out here so that we simply never get a valid mapping - * and so we drop the write altogether. The page truncation - * will kill the contents anyway. - */ - if (offset > i_size_read(inode)) { - wpc->imap.br_blockcount = end_fsb - offset_fsb; - wpc->imap.br_startoff = offset_fsb; - wpc->imap.br_startblock = HOLESTARTBLOCK; - wpc->imap.br_state = XFS_EXT_NORM; - return 0; - } - goto allocate_blocks; }
We already shortcut xfs_map_blocks for COW mappings, but there is just as little reason to start writeback beyond i_size in the data fork. Note that this has to be just an optimization as hole punches can unmaps block just like truncate, and we need to handle that case further down in the low-level code. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_aops.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-)