Message ID | 20240910043949.3481298-8-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [01/12] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release | expand |
On Tue, Sep 10, 2024 at 07:39:09AM +0300, Christoph Hellwig wrote: > xfs_file_write_zero_eof is the only caller of xfs_zero_range that does > not take XFS_MMAPLOCK_EXCL (aka the invalidate lock). Currently that > is acrually the right thing, as an error in the iomap zeroing code will actually > also take the invalidate_lock to clean up, but to fix that deadlock we > need a consistent locking pattern first. > > The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read > pagefaults, which isn't really needed here, but also not actively > harmful. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_file.c | 8 +++++++- > fs/xfs/xfs_iomap.c | 2 ++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index a30fda1985e6af..37dc26f51ace65 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -357,6 +357,7 @@ xfs_file_write_zero_eof( > { > struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host); > loff_t isize; > + int error; > > /* > * We need to serialise against EOF updates that occur in IO completions > @@ -404,7 +405,12 @@ xfs_file_write_zero_eof( > } > > trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize); > - return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL); > + > + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); > + error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL); > + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); Ah, ok, so we're taking the invalidate_lock so that we can't have page faults that might add folios (or dirty existing ones) in the mapping. We're the only ones who can access the page cache, and we're doing that so that we can zero the folios between the old EOF and the start of the write region. Is that right? Then Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + > + return error; > } > > /* > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 1e11f48814c0d0..3c98d82c0ad0dc 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1435,6 +1435,8 @@ xfs_zero_range( > { > struct inode *inode = VFS_I(ip); > > + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); > + > if (IS_DAX(inode)) > return dax_zero_range(inode, pos, len, did_zero, > &xfs_dax_write_iomap_ops); > -- > 2.45.2 > >
On Tue, Sep 17, 2024 at 02:24:27PM -0700, Darrick J. Wong wrote: > Ah, ok, so we're taking the invalidate_lock so that we can't have page > faults that might add folios (or dirty existing ones) in the mapping. > We're the only ones who can access the page cache, and we're doing that > so that we can zero the folios between the old EOF and the start of the > write region. > > Is that right? Then Yes. We might eventually also relax this a bit to only take the look if we actually zero anything, but for that we need to do the work to turn the iomap iterators inside out first.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a30fda1985e6af..37dc26f51ace65 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -357,6 +357,7 @@ xfs_file_write_zero_eof( { struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host); loff_t isize; + int error; /* * We need to serialise against EOF updates that occur in IO completions @@ -404,7 +405,12 @@ xfs_file_write_zero_eof( } trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize); - return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL); + + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); + error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL); + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); + + return error; } /* diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 1e11f48814c0d0..3c98d82c0ad0dc 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1435,6 +1435,8 @@ xfs_zero_range( { struct inode *inode = VFS_I(ip); + xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); + if (IS_DAX(inode)) return dax_zero_range(inode, pos, len, did_zero, &xfs_dax_write_iomap_ops);
xfs_file_write_zero_eof is the only caller of xfs_zero_range that does not take XFS_MMAPLOCK_EXCL (aka the invalidate lock). Currently that is acrually the right thing, as an error in the iomap zeroing code will also take the invalidate_lock to clean up, but to fix that deadlock we need a consistent locking pattern first. The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read pagefaults, which isn't really needed here, but also not actively harmful. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_file.c | 8 +++++++- fs/xfs/xfs_iomap.c | 2 ++ 2 files changed, 9 insertions(+), 1 deletion(-)