Message ID | 153923117420.5546.13317703807467393934.stgit@magnolia (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: fixes for serious clone/dedupe problems | expand |
On Wed, Oct 10, 2018 at 09:12:54PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > A deduplication data corruption is exposed by fstests generic/505 on > XFS. It is caused by extending the block match range to include the > partial EOF block, but then allowing unknown data beyond EOF to be > considered a "match" to data in the destination file because the > comparison is only made to the end of the source file. This corrupts the > destination file when the source extent is shared with it. > > The VFS remapping prep functions only support whole block dedupe, but > we still need to appear to support whole file dedupe correctly. Hence > if the dedupe request includes the last block of the souce file, don't > include it in the actual dedupe operation. If the rest of the range > dedupes successfully, then reject the entire request. A subsequent > patch will enable us to shorten dedupe requests correctly. Ok, so this patch rejects whole file dedupe requests, and then a later patch adds support back in for it? Doesn't that leave a bisect landmine behind? Why separate the functionality like this? Cheers, Dave.
On Fri, Oct 12, 2018 at 11:16:16AM +1100, Dave Chinner wrote: > On Wed, Oct 10, 2018 at 09:12:54PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > A deduplication data corruption is exposed by fstests generic/505 on > > XFS. It is caused by extending the block match range to include the > > partial EOF block, but then allowing unknown data beyond EOF to be > > considered a "match" to data in the destination file because the > > comparison is only made to the end of the source file. This corrupts the > > destination file when the source extent is shared with it. > > > > The VFS remapping prep functions only support whole block dedupe, but > > we still need to appear to support whole file dedupe correctly. Hence > > if the dedupe request includes the last block of the souce file, don't > > include it in the actual dedupe operation. If the rest of the range > > dedupes successfully, then reject the entire request. A subsequent > > patch will enable us to shorten dedupe requests correctly. > > Ok, so this patch rejects whole file dedupe requests, and then a > later patch adds support back in for it? > > Doesn't that leave a bisect landmine behind? Why separate the > functionality like this? Heh, it's a leftover from when I was trying to undo the behavior that bytes_deduped == len even if we rounded down. I gave up on that, so this can match the xfs patch. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Fri, Oct 12, 2018 at 09:22:18PM +0100, Filipe Manana wrote: > On Thu, Oct 11, 2018 at 5:13 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > A deduplication data corruption is exposed by fstests generic/505 on > > XFS. > > (and btrfs) > > Btw, the generic test I wrote was indeed numbered 505, however it was > never committed and there's now a generic/505 which has nothing to do > with deduplication. > So you should update the changelog to avoid confusion. What test is it now? And if it hasn't been committed, are you going to update it and repost as it clearly had value.... Cheers, Dave.
On Fri, Nov 02, 2018 at 12:04:39PM +0000, Filipe Manana wrote: > On Mon, Oct 15, 2018 at 1:31 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Fri, Oct 12, 2018 at 09:22:18PM +0100, Filipe Manana wrote: > > > On Thu, Oct 11, 2018 at 5:13 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > > > A deduplication data corruption is exposed by fstests generic/505 on > > > > XFS. > > > > > > (and btrfs) > > > > > > Btw, the generic test I wrote was indeed numbered 505, however it was > > > never committed and there's now a generic/505 which has nothing to do > > > with deduplication. > > > So you should update the changelog to avoid confusion. > > > > What test is it now? And if it hasn't been committed, are you going > > to update it and repost as it clearly had value.... > > Sorry, I lost track of this. > > So what was the conclusion of the thread where discussion about this > problem started? > It wasn't clear to me if a consensus was reached and got lost on that > long user space dedupe tools discussion between you and Zygo. > > The test assumed a fix of rounding down the range and deduping less > bytes then requested (which ended up included in 4.19 for btrfs). > > From this vfs patch it seems it was decided to return errno -EDADE instead. > Is this the final decision? No, I reworked the whole mess to match btrfs-4.19 behavior of deduping fewer bytes than requested. --D > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > > > -- > Filipe David Manana, > > “Whether you think you can, or you think you can't — you're right.”
diff --git a/fs/read_write.c b/fs/read_write.c index d6e8e242a15f..8498991e2f33 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1723,6 +1723,7 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in, { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); + u64 blkmask = i_blocksize(inode_in) - 1; bool same_inode = (inode_in == inode_out); int ret; @@ -1785,6 +1786,27 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in, return -EBADE; } + /* Are we doing a partial EOF block remapping of some kind? */ + if (*len & blkmask) { + /* + * If the dedupe data matches, don't try to dedupe the partial + * EOF block. + * + * If the user is attempting to remap a partial EOF block and + * it's inside the destination EOF then reject it. + * + * We don't support shortening requests, so we can only reject + * them. + */ + if (is_dedupe) + ret = -EBADE; + else if (pos_out + *len < i_size_read(inode_out)) + ret = -EINVAL; + + if (ret) + return ret; + } + return 1; } EXPORT_SYMBOL(vfs_clone_file_prep);