Message ID | 20190815165043.GB15186@magnolia (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | xfs: fix reflink source file racing with directio writes | expand |
On Thu, Aug 15, 2019 at 09:50:43AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > While trawling through the dedupe file comparison code trying to fix > page deadlocking problems, Dave Chinner noticed that the reflink code > only takes shared IOLOCK/MMAPLOCKs on the source file. Because > page_mkwrite and directio writes do not take the EXCL versions of those > locks, this means that reflink can race with writer processes. > > For pure remapping this can lead to undefined behavior and file > corruption; for dedupe this means that we cannot be sure that the > contents are identical when we decide to go ahead with the remapping. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> I don't think this is quite right yet. The source inode locking change is good, but it doesn't break the layout on the source inode and so there is still they possibility that something has physical access and is directly modifying the source file. And with that, I suspect the locking algorithm changes substantially: order inodes restart: lock first inode break layout on first inode lock second inode break layout on second inode fail then unlock both inodes, goto restart Cheers, Dave.
On Fri, Aug 16, 2019 at 07:37:17AM +1000, Dave Chinner wrote: > I don't think this is quite right yet. The source inode locking > change is good, but it doesn't break the layout on the source inode > and so there is still they possibility that something has physical > access and is directly modifying the source file. > > And with that, I suspect the locking algorithm changes > substantially: > > order inodes > restart: > lock first inode > break layout on first inode > lock second inode > break layout on second inode > fail then unlock both inodes, goto restart Agreed, that is the locking scheme I'd expect.
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index c4ec7afd1170..9fd417de15c9 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1205,7 +1205,7 @@ xfs_iolock_two_inodes_and_break_layout( retry: if (src < dest) { - inode_lock_shared(src); + inode_lock(src); inode_lock_nested(dest, I_MUTEX_NONDIR2); } else { /* src >= dest */ @@ -1216,7 +1216,7 @@ xfs_iolock_two_inodes_and_break_layout( if (error == -EWOULDBLOCK) { inode_unlock(dest); if (src < dest) - inode_unlock_shared(src); + inode_unlock(src); error = break_layout(dest, true); if (error) return error; @@ -1225,11 +1225,11 @@ xfs_iolock_two_inodes_and_break_layout( if (error) { inode_unlock(dest); if (src < dest) - inode_unlock_shared(src); + inode_unlock(src); return error; } if (src > dest) - inode_lock_shared_nested(src, I_MUTEX_NONDIR2); + inode_lock_nested(src, I_MUTEX_NONDIR2); return 0; } @@ -1247,10 +1247,10 @@ xfs_reflink_remap_unlock( xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); if (!same_inode) - xfs_iunlock(src, XFS_MMAPLOCK_SHARED); + xfs_iunlock(src, XFS_MMAPLOCK_EXCL); inode_unlock(inode_out); if (!same_inode) - inode_unlock_shared(inode_in); + inode_unlock(inode_in); } /* @@ -1325,7 +1325,7 @@ xfs_reflink_remap_prep( if (same_inode) xfs_ilock(src, XFS_MMAPLOCK_EXCL); else - xfs_lock_two_inodes(src, XFS_MMAPLOCK_SHARED, dest, + xfs_lock_two_inodes(src, XFS_MMAPLOCK_EXCL, dest, XFS_MMAPLOCK_EXCL); /* Check file eligibility and prepare for block sharing. */