Message ID | 1484156571-65403-1-git-send-email-bfoster@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
> if (need_iolock) { > if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) > return -EAGAIN; > } > + inode_dio_wait(VFS_I(ip)); inode_dio_wait generally is only safe to call with i_rwsem held exclsuively, so if we'd need the call for the !need_iolock this would be broken. Fortunately we don't even need the call in that case, so this should be safe. I'd still prefer to move the inode_dio_wait call into the need_iolock block to make that clear, though. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 05:56:03AM -0800, Christoph Hellwig wrote: > > if (need_iolock) { > > if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) > > return -EAGAIN; > > } > > + inode_dio_wait(VFS_I(ip)); > > inode_dio_wait generally is only safe to call with i_rwsem held > exclsuively, so if we'd need the call for the !need_iolock this would > be broken. Fortunately we don't even need the call in that case, so > this should be safe. I'd still prefer to move the inode_dio_wait call > into the need_iolock block to make that clear, though. !need_iolock means the iolock is already held. I guess the name is kind of confusing. !need_iolock doesn't mean that the lock is unnecessary, it just means that we're calling from a context where it's already held. See the xfs_icache_free_eofblocks() call from xfs_file_buffered_aio_write() for reference. I suppose I could add an ASSERT(xfs_isilocked()) after that block to better document that.. Brian -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 09:05:59AM -0500, Brian Foster wrote: > !need_iolock means the iolock is already held. I guess the name is kind > of confusing. !need_iolock doesn't mean that the lock is unnecessary, it > just means that we're calling from a context where it's already held. > See the xfs_icache_free_eofblocks() call from > xfs_file_buffered_aio_write() for reference. > > I suppose I could add an ASSERT(xfs_isilocked()) after that block to > better document that.. Yeah. In fact I'd prefer to kill that parameter at all, it's horrible. Instead we should always expect the lock and assert that it's held, and have the two current need_iolock = false callers take it manually. This will increase lock hold times a bit for the current need_iolock callers, but the most important one, xfs_release already does a previous unlocked xfs_can_free_eofblocks check, and xfs_inode_free_eofblocks is only called it the inode is tagged, so this should not be an issue (and if it is a xfs_can_free_eofblocks call comes to rescue). Btw, there is a comment incorrectly referring to xfs_free_eofblocks in xfs_inode_free_cowblocks while we're at it. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 11:33:15PM -0800, Christoph Hellwig wrote: > On Thu, Jan 12, 2017 at 09:05:59AM -0500, Brian Foster wrote: > > !need_iolock means the iolock is already held. I guess the name is kind > > of confusing. !need_iolock doesn't mean that the lock is unnecessary, it > > just means that we're calling from a context where it's already held. > > See the xfs_icache_free_eofblocks() call from > > xfs_file_buffered_aio_write() for reference. > > > > I suppose I could add an ASSERT(xfs_isilocked()) after that block to > > better document that.. > > Yeah. In fact I'd prefer to kill that parameter at all, it's horrible. > Instead we should always expect the lock and assert that it's held, > and have the two current need_iolock = false callers take it manually. > I may have lied about iolock holders.. I don't think we have the iolock in the xfs_inactive() case. That said, I don't think there's any harm in doing so there. It may have just been that way since we're breaking down the inode in that context. I agree that this is ugly. I'll try to kill off that param as suggested. > This will increase lock hold times a bit for the current need_iolock > callers, but the most important one, xfs_release already does a > previous unlocked xfs_can_free_eofblocks check, and > xfs_inode_free_eofblocks is only called it the inode is tagged, so this > should not be an issue (and if it is a xfs_can_free_eofblocks call > comes to rescue). > The need_iolock case is also currently a trylock, so I think we can retain that logic in the release case without being disruptive. > Btw, there is a comment incorrectly referring to xfs_free_eofblocks > in xfs_inode_free_cowblocks while we're at it. I would send that as a standalone patch because it's an entirely separate feature (and causes unnecessary backport churn, even though it's just a comment), so I'm not sure it's really worth fixing with this series. Brian > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index b9abce5..5844549 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -936,6 +936,10 @@ xfs_free_eofblocks( error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0); xfs_iunlock(ip, XFS_ILOCK_SHARED); + /* + * If there are blocks after the end of file, truncate the file to its + * current size to free them up. + */ if (!error && (nimaps != 0) && (imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks)) { @@ -947,14 +951,14 @@ xfs_free_eofblocks( return error; /* - * There are blocks after the end of file. - * Free them up now by truncating the file to - * its current size. + * Grab the iolock if the caller hasn't done so and wait on + * direct I/O to ensure i_size has settled. */ if (need_iolock) { if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) return -EAGAIN; } + inode_dio_wait(VFS_I(ip)); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
It's possible for post-eof blocks to end up being used for direct I/O writes. dio write performs an upfront unwritten extent allocation, sends the dio and then updates the inode size (if necessary) on write completion. If a file release occurs while a file extending dio write is in flight, it is possible to mistake the post-eof blocks for speculative preallocation and incorrectly truncate them from the inode. This means that the resulting dio write completion can discover a hole and allocate new blocks rather than perform unwritten extent conversion. This requires a strange mix of I/O and is thus not likely to reproduce in real world workloads. It is intermittently reproduced by generic/299. The error manifests as an assert failure due to transaction overrun because the aforementioned write completion transaction has only reserved enough blocks for btree operations: XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \ file: fs/xfs//xfs_trans.c, line: 309 The root cause is that xfs_free_eofblocks() uses i_size to truncate post-eof blocks from the inode, but async, file extending direct writes do not update i_size until write completion, long after inode locks are dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode to the incorrect size. Update xfs_free_eoflbocks() to serialize against dio similar to how extending writes are serialized against i_size updates before post-eof block zeroing. Specifically, wait on dio once the iolock is acquired. This ensures that dio write completions have updated i_size before post-eof blocks are processed. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_bmap_util.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)