Message ID | 20190813151434.GQ7138@magnolia (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] vfs: fix page locking deadlocks when deduping files | expand |
On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > + /* > + * Now that we've locked both pages, make sure they still > + * represent the data we're interested in. If not, someone > + * is invalidating pages on us and we lose. > + */ > + if (src_page->mapping != src->i_mapping || > + src_page->index != srcoff >> PAGE_SHIFT || > + dest_page->mapping != dest->i_mapping || > + dest_page->index != destoff >> PAGE_SHIFT) { > + same = false; > + goto unlock; > + } It is my understanding that you don't need to check the ->index here. If I'm wrong about that, I'd really appreciate being corrected, because the page cache locking is subtle. You call read_mapping_page() which returns the page with an elevated refcount. That means the page can't go back to the page allocator and be allocated again. It can, because it's unlocked, still be truncated, so the check for ->mapping after locking it is needed. But the check for ->index being correct was done by find_get_entry(). See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock the page, check the ->mapping but not check ->index. OK, it does check ->index, but in a VM_BUG_ON(), so it's not something that ought to be able to be wrong.
On Tue, Aug 13, 2019 at 4:15 PM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > When dedupe wants to use the page cache to compare parts of two files > for dedupe, we must be very careful to handle locking correctly. The > current code doesn't do this. It must lock and unlock the page only > once if the two pages are the same, since the overlapping range check > doesn't catch this when blocksize < pagesize. If the pages are distinct > but from the same file, we must observe page locking order and lock them > in order of increasing offset to avoid clashing with writeback locking. > > Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions") > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> We actually had the same bug in btrfs, before we had cloning/dedupe in vfs/xfs/etc, and fixed it back in 2017 [1]. I totally missed this behaviour in the vfs helpers when I updated btrfs to use them some months ago. Thanks. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1517622f2524f531113b12c27b9a0ea69c38983 > --- > v3: revalidate page after locking it > v2: provide an unlock helper > --- > fs/read_write.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 42 insertions(+), 8 deletions(-) > > diff --git a/fs/read_write.c b/fs/read_write.c > index 1f5088dec566..da341eb3033c 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in, > return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL; > } > > -/* > - * Read a page's worth of file data into the page cache. Return the page > - * locked. > - */ > +/* Read a page's worth of file data into the page cache. */ > static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) > { > struct page *page; > @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) > put_page(page); > return ERR_PTR(-EIO); > } > - lock_page(page); > return page; > } > > +/* > + * Lock two pages, ensuring that we lock in offset order if the pages are from > + * the same file. > + */ > +static void vfs_lock_two_pages(struct page *page1, struct page *page2) > +{ > + /* Always lock in order of increasing index. */ > + if (page1->index > page2->index) > + swap(page1, page2); > + > + lock_page(page1); > + if (page1 != page2) > + lock_page(page2); > +} > + > +/* Unlock two pages, being careful not to unlock the same page twice. */ > +static void vfs_unlock_two_pages(struct page *page1, struct page *page2) > +{ > + unlock_page(page1); > + if (page1 != page2) > + unlock_page(page2); > +} > + > /* > * Compare extents of two files to see if they are the same. > * Caller must have locked both inodes to prevent write races. > @@ -1867,10 +1886,25 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > dest_page = vfs_dedupe_get_page(dest, destoff); > if (IS_ERR(dest_page)) { > error = PTR_ERR(dest_page); > - unlock_page(src_page); > put_page(src_page); > goto out_error; > } > + > + vfs_lock_two_pages(src_page, dest_page); > + > + /* > + * Now that we've locked both pages, make sure they still > + * represent the data we're interested in. If not, someone > + * is invalidating pages on us and we lose. > + */ > + if (src_page->mapping != src->i_mapping || > + src_page->index != srcoff >> PAGE_SHIFT || > + dest_page->mapping != dest->i_mapping || > + dest_page->index != destoff >> PAGE_SHIFT) { > + same = false; > + goto unlock; > + } > + > src_addr = kmap_atomic(src_page); > dest_addr = kmap_atomic(dest_page); > > @@ -1882,8 +1916,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > > kunmap_atomic(dest_addr); > kunmap_atomic(src_addr); > - unlock_page(dest_page); > - unlock_page(src_page); > +unlock: > + vfs_unlock_two_pages(src_page, dest_page); > put_page(dest_page); > put_page(src_page); >
On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > + /* > > + * Now that we've locked both pages, make sure they still > > + * represent the data we're interested in. If not, someone > > + * is invalidating pages on us and we lose. > > + */ > > + if (src_page->mapping != src->i_mapping || > > + src_page->index != srcoff >> PAGE_SHIFT || > > + dest_page->mapping != dest->i_mapping || > > + dest_page->index != destoff >> PAGE_SHIFT) { > > + same = false; > > + goto unlock; > > + } > > It is my understanding that you don't need to check the ->index here. > If I'm wrong about that, I'd really appreciate being corrected, because > the page cache locking is subtle. > > You call read_mapping_page() which returns the page with an elevated > refcount. That means the page can't go back to the page allocator and > be allocated again. It can, because it's unlocked, still be truncated, > so the check for ->mapping after locking it is needed. But the check > for ->index being correct was done by find_get_entry(). > > See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock > the page, check the ->mapping but not check ->index. OK, it does check > ->index, but in a VM_BUG_ON(), so it's not something that ought to be > able to be wrong. That is my understanding as well. In details... The page data get ready after read_mapping_page() is successfully returned. However, if someone needs to get a stable untruncated page, lock_page() and recheck page->mapping are needed as well. I have no idea how page->index can be changed safely without reallocating the page, even some paths could keep using some truncated page temporarily with some refcounts held but I think those paths cannot add these pages directly to some page cache again without freeing since it seems really unsafe..... Thanks, Gao Xiang >
On Wed, Aug 14, 2019 at 03:03:21PM +0800, Gao Xiang wrote: > On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > > + /* > > > + * Now that we've locked both pages, make sure they still > > > + * represent the data we're interested in. If not, someone > > > + * is invalidating pages on us and we lose. > > > + */ > > > + if (src_page->mapping != src->i_mapping || > > > + src_page->index != srcoff >> PAGE_SHIFT || > > > + dest_page->mapping != dest->i_mapping || > > > + dest_page->index != destoff >> PAGE_SHIFT) { > > > + same = false; > > > + goto unlock; > > > + } > > > > It is my understanding that you don't need to check the ->index here. > > If I'm wrong about that, I'd really appreciate being corrected, because > > the page cache locking is subtle. > > > > You call read_mapping_page() which returns the page with an elevated > > refcount. That means the page can't go back to the page allocator and > > be allocated again. It can, because it's unlocked, still be truncated, > > so the check for ->mapping after locking it is needed. But the check > > for ->index being correct was done by find_get_entry(). > > > > See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock > > the page, check the ->mapping but not check ->index. OK, it does check > > ->index, but in a VM_BUG_ON(), so it's not something that ought to be > > able to be wrong. > > That is my understanding as well. In details... > > The page data get ready after read_mapping_page() is successfully > returned. However, if someone needs to get a stable untruncated page, > lock_page() and recheck page->mapping are needed as well. > > I have no idea how page->index can be changed safely without reallocating > the page, even some paths could keep using some truncated page temporarily > with some refcounts held but I think those paths cannot add these pages Such a case is like that even if the page can be truncated at the same time without locking, some paths only needs to get its page data unstrictly (and note that these pages should be Uptodated before). Therefore those paths can only take a refcount without PG_lock... But such refcounts should be used temporarily, those pages cannot be added to page cache again without reallocating... Thanks, Gao Xiang > directly to some page cache again without freeing since it seems really > unsafe..... > > Thanks, > Gao Xiang > > >
On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > + /* > > + * Now that we've locked both pages, make sure they still > > + * represent the data we're interested in. If not, someone > > + * is invalidating pages on us and we lose. > > + */ > > + if (src_page->mapping != src->i_mapping || > > + src_page->index != srcoff >> PAGE_SHIFT || > > + dest_page->mapping != dest->i_mapping || > > + dest_page->index != destoff >> PAGE_SHIFT) { > > + same = false; > > + goto unlock; > > + } > > It is my understanding that you don't need to check the ->index here. > If I'm wrong about that, I'd really appreciate being corrected, because > the page cache locking is subtle. Ah, when talking to Darrick about this, I didn't notice the code took references on the page, so it probably doesn't need the index check - the page can't be recycled out from under us here an inserted into a new mapping until we drop the reference. What I was mainly concerned about here is that we only have a shared inode lock on the src inode, so this code can be running concurrently with both invalidation and insertion into the mapping. e.g. direct write io does invalidation, buffered read does insertion. Hence we have to be really careful about the data in the source page being valid and stable while we run the comparison. And on further thought, I don't think shared locking is actually safe here. A shared lock doesn't stop new direct IO from being submitted, so inode_dio_wait() just drains IO at that point in time and but doesn't provide any guarantee that there isn't concurrent DIO running. Hence we could do the comparison here, see the data is the same, drop the page lock, a DIO write then invalidates the page and writes new data while we are comparing the rest of page(s) in the range. By the time we've checked the whole range, the data at the start is no longer the same, and the comparison is stale. And then we do the dedupe operation oblivious to the fact the data on disk doesn't actually match anymore, and we corrupt the data in the destination file as it gets linked to mismatched data in the source file.... Darrick? > You call read_mapping_page() which returns the page with an elevated > refcount. That means the page can't go back to the page allocator and > be allocated again. It can, because it's unlocked, still be truncated, > so the check for ->mapping after locking it is needed. But the check > for ->index being correct was done by find_get_entry(). > > See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock > the page, check the ->mapping but not check ->index. OK, it does check > ->index, but in a VM_BUG_ON(), so it's not something that ought to be > able to be wrong. Yeah, we used to have to play tricks in the old XFS writeback clustering code to do our own non-blocking page cache lookups adn this was one of the things we needed to be careful about until the pagevec_lookup* interfaces came along and solved all the problems for us. Funny how the brain remembers old gotchas with also reminding you that the problems went away almost as long ago..... Cheers, Dave.
On Wed, Aug 14, 2019 at 07:54:48PM +1000, Dave Chinner wrote: > On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > > + /* > > > + * Now that we've locked both pages, make sure they still > > > + * represent the data we're interested in. If not, someone > > > + * is invalidating pages on us and we lose. > > > + */ > > > + if (src_page->mapping != src->i_mapping || > > > + src_page->index != srcoff >> PAGE_SHIFT || > > > + dest_page->mapping != dest->i_mapping || > > > + dest_page->index != destoff >> PAGE_SHIFT) { > > > + same = false; > > > + goto unlock; > > > + } > > > > It is my understanding that you don't need to check the ->index here. > > If I'm wrong about that, I'd really appreciate being corrected, because > > the page cache locking is subtle. > > Ah, when talking to Darrick about this, I didn't notice the code > took references on the page, so it probably doesn't need the index > check - the page can't be recycled out from under us here an > inserted into a new mapping until we drop the reference. > > What I was mainly concerned about here is that we only have a shared > inode lock on the src inode, so this code can be running > concurrently with both invalidation and insertion into the mapping. > e.g. direct write io does invalidation, buffered read does > insertion. Hence we have to be really careful about the data in the > source page being valid and stable while we run the comparison. > > And on further thought, I don't think shared locking is actually > safe here. A shared lock doesn't stop new direct IO from being > submitted, so inode_dio_wait() just drains IO at that point in time > and but doesn't provide any guarantee that there isn't concurrent > DIO running. > > Hence we could do the comparison here, see the data is the same, > drop the page lock, a DIO write then invalidates the page and writes > new data while we are comparing the rest of page(s) in the range. By > the time we've checked the whole range, the data at the start is no > longer the same, and the comparison is stale. > > And then we do the dedupe operation oblivious to the fact the data > on disk doesn't actually match anymore, and we corrupt the data in > the destination file as it gets linked to mismatched data in the > source file.... <urrrrrrk> Yeah, that looks like a bug to me. I didn't realize that directio writes were IOLOCK_SHARED and therefore reflink has to take IOLOCK_EXCL to block them. Related question: do we need to do the same for MMAPLOCK? I think the answer is yes because xfs_filemap_fault can call page_mkwrite with MMAPLOCK_SHARED. --D > Darrick? > > > You call read_mapping_page() which returns the page with an elevated > > refcount. That means the page can't go back to the page allocator and > > be allocated again. It can, because it's unlocked, still be truncated, > > so the check for ->mapping after locking it is needed. But the check > > for ->index being correct was done by find_get_entry(). > > > > See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock > > the page, check the ->mapping but not check ->index. OK, it does check > > ->index, but in a VM_BUG_ON(), so it's not something that ought to be > > able to be wrong. > > Yeah, we used to have to play tricks in the old XFS writeback > clustering code to do our own non-blocking page cache lookups adn > this was one of the things we needed to be careful about until > the pagevec_lookup* interfaces came along and solved all the > problems for us. Funny how the brain remembers old gotchas with > also reminding you that the problems went away almost as long > ago..... > > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Aug 14, 2019 at 08:33:49AM -0700, Darrick J. Wong wrote: > On Wed, Aug 14, 2019 at 07:54:48PM +1000, Dave Chinner wrote: > > On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > > > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > > > + /* > > > > + * Now that we've locked both pages, make sure they still > > > > + * represent the data we're interested in. If not, someone > > > > + * is invalidating pages on us and we lose. > > > > + */ > > > > + if (src_page->mapping != src->i_mapping || > > > > + src_page->index != srcoff >> PAGE_SHIFT || > > > > + dest_page->mapping != dest->i_mapping || > > > > + dest_page->index != destoff >> PAGE_SHIFT) { > > > > + same = false; > > > > + goto unlock; > > > > + } > > > > > > It is my understanding that you don't need to check the ->index here. > > > If I'm wrong about that, I'd really appreciate being corrected, because > > > the page cache locking is subtle. > > > > Ah, when talking to Darrick about this, I didn't notice the code > > took references on the page, so it probably doesn't need the index > > check - the page can't be recycled out from under us here an > > inserted into a new mapping until we drop the reference. > > > > What I was mainly concerned about here is that we only have a shared > > inode lock on the src inode, so this code can be running > > concurrently with both invalidation and insertion into the mapping. > > e.g. direct write io does invalidation, buffered read does > > insertion. Hence we have to be really careful about the data in the > > source page being valid and stable while we run the comparison. > > > > And on further thought, I don't think shared locking is actually > > safe here. A shared lock doesn't stop new direct IO from being > > submitted, so inode_dio_wait() just drains IO at that point in time > > and but doesn't provide any guarantee that there isn't concurrent > > DIO running. > > > > Hence we could do the comparison here, see the data is the same, > > drop the page lock, a DIO write then invalidates the page and writes > > new data while we are comparing the rest of page(s) in the range. By > > the time we've checked the whole range, the data at the start is no > > longer the same, and the comparison is stale. > > > > And then we do the dedupe operation oblivious to the fact the data > > on disk doesn't actually match anymore, and we corrupt the data in > > the destination file as it gets linked to mismatched data in the > > source file.... > > <urrrrrrk> Yeah, that looks like a bug to me. I didn't realize that > directio writes were IOLOCK_SHARED and therefore reflink has to take > IOLOCK_EXCL to block them. > > Related question: do we need to do the same for MMAPLOCK? I think the > answer is yes because xfs_filemap_fault can call page_mkwrite with > MMAPLOCK_SHARED. Hmmmm. Yes, you are right, but I don't just holding MMAPLOCK_EXCL is enough. Holding the MMAPLOCK will hold off page faults while we have the lock, but it won't prevent pages that already have writeable ptes from being modified as they don't require another page fault until they've been cleaned. So it seems to me that if we need to ensure that the file range is not being concurrently modified, we have to: a) inode lock exclusive b) mmap lock exclusive c) break layouts(*) d) wait for dio e) clean all the dirty pages On both the source and destination files. And then, because the locks we hold form a barrier against newly dirtied pages, will all attempts to modify the data be blocked. And so now the data comparison can be done safely. (*) The break layouts bit is necessary to handle co-ordination with RDMA, NVMEoF, P2P DMA, pNFS, etc that manipulate data directly via the block device rather than through file pages or DIO... Cheers, Dave.
On Thu, Aug 15, 2019 at 07:28:33AM +1000, Dave Chinner wrote: > On Wed, Aug 14, 2019 at 08:33:49AM -0700, Darrick J. Wong wrote: > > On Wed, Aug 14, 2019 at 07:54:48PM +1000, Dave Chinner wrote: > > > On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote: > > > > On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote: > > > > > + /* > > > > > + * Now that we've locked both pages, make sure they still > > > > > + * represent the data we're interested in. If not, someone > > > > > + * is invalidating pages on us and we lose. > > > > > + */ > > > > > + if (src_page->mapping != src->i_mapping || > > > > > + src_page->index != srcoff >> PAGE_SHIFT || > > > > > + dest_page->mapping != dest->i_mapping || > > > > > + dest_page->index != destoff >> PAGE_SHIFT) { > > > > > + same = false; > > > > > + goto unlock; > > > > > + } > > > > > > > > It is my understanding that you don't need to check the ->index here. > > > > If I'm wrong about that, I'd really appreciate being corrected, because > > > > the page cache locking is subtle. > > > > > > Ah, when talking to Darrick about this, I didn't notice the code > > > took references on the page, so it probably doesn't need the index > > > check - the page can't be recycled out from under us here an > > > inserted into a new mapping until we drop the reference. > > > > > > What I was mainly concerned about here is that we only have a shared > > > inode lock on the src inode, so this code can be running > > > concurrently with both invalidation and insertion into the mapping. > > > e.g. direct write io does invalidation, buffered read does > > > insertion. Hence we have to be really careful about the data in the > > > source page being valid and stable while we run the comparison. > > > > > > And on further thought, I don't think shared locking is actually > > > safe here. A shared lock doesn't stop new direct IO from being > > > submitted, so inode_dio_wait() just drains IO at that point in time > > > and but doesn't provide any guarantee that there isn't concurrent > > > DIO running. > > > > > > Hence we could do the comparison here, see the data is the same, > > > drop the page lock, a DIO write then invalidates the page and writes > > > new data while we are comparing the rest of page(s) in the range. By > > > the time we've checked the whole range, the data at the start is no > > > longer the same, and the comparison is stale. > > > > > > And then we do the dedupe operation oblivious to the fact the data > > > on disk doesn't actually match anymore, and we corrupt the data in > > > the destination file as it gets linked to mismatched data in the > > > source file.... > > > > <urrrrrrk> Yeah, that looks like a bug to me. I didn't realize that > > directio writes were IOLOCK_SHARED and therefore reflink has to take > > IOLOCK_EXCL to block them. > > > > Related question: do we need to do the same for MMAPLOCK? I think the > > answer is yes because xfs_filemap_fault can call page_mkwrite with > > MMAPLOCK_SHARED. > > Hmmmm. Yes, you are right, but I don't just holding MMAPLOCK_EXCL is > enough. Holding the MMAPLOCK will hold off page faults while we have > the lock, but it won't prevent pages that already have writeable > ptes from being modified as they don't require another page fault > until they've been cleaned. > > So it seems to me that if we need to ensure that the file range is > not being concurrently modified, we have to: > > a) inode lock exclusive > b) mmap lock exclusive > c) break layouts(*) > d) wait for dio > e) clean all the dirty pages > > On both the source and destination files. And then, because the > locks we hold form a barrier against newly dirtied pages, will all > attempts to modify the data be blocked. And so now the data > comparison can be done safely. I think xfs already proceeds in this order (a-e), it's just that we aren't correctly taking IOLOCK_EXCL/MMAPLOCK_EXCL on the source file to prevent some other thread from starting a directio write or dirtying an mmap'd page. But let's try my crappy little patch that fixes the shared locks and see what other smoke comes out of the machine... --D > (*) The break layouts bit is necessary to handle co-ordination with > RDMA, NVMEoF, P2P DMA, pNFS, etc that manipulate data directly via > the block device rather than through file pages or DIO... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/read_write.c b/fs/read_write.c index 1f5088dec566..da341eb3033c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in, return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL; } -/* - * Read a page's worth of file data into the page cache. Return the page - * locked. - */ +/* Read a page's worth of file data into the page cache. */ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) { struct page *page; @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) put_page(page); return ERR_PTR(-EIO); } - lock_page(page); return page; } +/* + * Lock two pages, ensuring that we lock in offset order if the pages are from + * the same file. + */ +static void vfs_lock_two_pages(struct page *page1, struct page *page2) +{ + /* Always lock in order of increasing index. */ + if (page1->index > page2->index) + swap(page1, page2); + + lock_page(page1); + if (page1 != page2) + lock_page(page2); +} + +/* Unlock two pages, being careful not to unlock the same page twice. */ +static void vfs_unlock_two_pages(struct page *page1, struct page *page2) +{ + unlock_page(page1); + if (page1 != page2) + unlock_page(page2); +} + /* * Compare extents of two files to see if they are the same. * Caller must have locked both inodes to prevent write races. @@ -1867,10 +1886,25 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, dest_page = vfs_dedupe_get_page(dest, destoff); if (IS_ERR(dest_page)) { error = PTR_ERR(dest_page); - unlock_page(src_page); put_page(src_page); goto out_error; } + + vfs_lock_two_pages(src_page, dest_page); + + /* + * Now that we've locked both pages, make sure they still + * represent the data we're interested in. If not, someone + * is invalidating pages on us and we lose. + */ + if (src_page->mapping != src->i_mapping || + src_page->index != srcoff >> PAGE_SHIFT || + dest_page->mapping != dest->i_mapping || + dest_page->index != destoff >> PAGE_SHIFT) { + same = false; + goto unlock; + } + src_addr = kmap_atomic(src_page); dest_addr = kmap_atomic(dest_page); @@ -1882,8 +1916,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, kunmap_atomic(dest_addr); kunmap_atomic(src_addr); - unlock_page(dest_page); - unlock_page(src_page); +unlock: + vfs_unlock_two_pages(src_page, dest_page); put_page(dest_page); put_page(src_page);