Message ID | 1422295551-25402-1-git-send-email-git@nerdoftheherd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: > It is not currently possible to deduplicate the last block of files > whose size is not a multiple of the block size, as the btrfs_extent_same > ioctl returns -EINVAL if offset + size is greater than the file size or > is not aligned to the fs block size. Do you have a reproducer for that? The alignment is required to let btrfs_clone and the extent dropping functions to work. The inline files are not aligned and should not be deduplicated anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote: > On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: >> It is not currently possible to deduplicate the last block of files >> whose size is not a multiple of the block size, as the btrfs_extent_same >> ioctl returns -EINVAL if offset + size is greater than the file size or >> is not aligned to the fs block size. > > Do you have a reproducer for that? I've been using the (quick and dirty) bash script at the end of this mail which uses btrfs-extent-same from https://github.com/markfasheh/duperemove/ to call the ioctl. To summarize: it creates a new filesystem, creates a file with a size which is not a multiple of the block size, copies it, and then calls the ioctl to ask firstly for all of the complete blocks (for comparison) and then the entire files to be deduplicated. Running the script under a kernel compiled from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives a status of -22 from the second btrfs-extent-same call and the final btrfs filesystem df shows: Data, single: total=8.00MiB, used=1.91MiB However, running under the same kernel plus my patch shows this final data usage: Data, single: total=8.00MiB, used=980.00KiB > The alignment is required to let btrfs_clone and the extent dropping > functions to work. [...] Which is why it is currently not possible to deduplicate a final incomplete block of a file: * Passing len + offset = the actual end of the file: Rejected as it is not aligned * Passing len + offset = the end of the block: Rejected as it exceeds the actual end of the file. Please let me know if you need any further information, if my implementation should be different or there is a better way I could demonstrate the issue? Many Thanks, Matt --- #!/bin/bash -e if [[ $EUID -ne 0 ]]; then echo "This script must be run as root" exit 1 fi loopback=$(losetup -f) echo "## Create new btrfs filesystem on a loopback device" dd if=/dev/zero of=testfs bs=1048576 count=1500 losetup $loopback testfs mkfs.btrfs $loopback mkdir testfsmnt mount $loopback testfsmnt echo -e "\n## Create 1000000 byte random file" dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Copy file" cp testfsmnt/test1 testfsmnt/test2 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Dedupe to end of last full block" btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Dedupe to end of file" btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\nClean up" umount testfsmnt rmdir testfsmnt losetup -d $loopback rm testfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, Have you had a chance to look at this? Am very happy to answer further questions, adjust my implementation, provide a different kind of test case, etc. Many Thanks, Matt On 28 January 2015 at 19:46, Matt Robinson <git@nerdoftheherd.com> wrote: > On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote: >> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: >>> It is not currently possible to deduplicate the last block of files >>> whose size is not a multiple of the block size, as the btrfs_extent_same >>> ioctl returns -EINVAL if offset + size is greater than the file size or >>> is not aligned to the fs block size. >> >> Do you have a reproducer for that? > > I've been using the (quick and dirty) bash script at the end of this > mail which uses btrfs-extent-same from > https://github.com/markfasheh/duperemove/ to call the ioctl. To > summarize: it creates a new filesystem, creates a file with a size > which is not a multiple of the block size, copies it, and then calls > the ioctl to ask firstly for all of the complete blocks (for > comparison) and then the entire files to be deduplicated. > > Running the script under a kernel compiled from > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives > a status of -22 from the second btrfs-extent-same call and the final > btrfs filesystem df shows: > Data, single: total=8.00MiB, used=1.91MiB > > However, running under the same kernel plus my patch shows this final > data usage: > Data, single: total=8.00MiB, used=980.00KiB > >> The alignment is required to let btrfs_clone and the extent dropping >> functions to work. [...] > > Which is why it is currently not possible to deduplicate a final > incomplete block of a file: > * Passing len + offset = the actual end of the file: Rejected as it is > not aligned > * Passing len + offset = the end of the block: Rejected as it exceeds > the actual end of the file. > > Please let me know if you need any further information, if my > implementation should be different or there is a better way I could > demonstrate the issue? > > Many Thanks, > > Matt > > --- > > #!/bin/bash -e > > if [[ $EUID -ne 0 ]]; then > echo "This script must be run as root" > exit 1 > fi > > loopback=$(losetup -f) > > echo "## Create new btrfs filesystem on a loopback device" > dd if=/dev/zero of=testfs bs=1048576 count=1500 > losetup $loopback testfs > mkfs.btrfs $loopback > mkdir testfsmnt > mount $loopback testfsmnt > > echo -e "\n## Create 1000000 byte random file" > dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Copy file" > cp testfsmnt/test1 testfsmnt/test2 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Dedupe to end of last full block" > btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Dedupe to end of file" > btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\nClean up" > umount testfsmnt > rmdir testfsmnt > losetup -d $loopback > rm testfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I second this. I've seen the same behavior. Clone seems to have evolved a little further than extent-same knows about. e.g. there is code in the extent-same ioctl that tries to avoid doing a clone from within one inode to elsewhere in the same inode; however, the clone ioctl (which extent-same calls) has no such restriction. As Matt mentioned, clone_range seems quite happy to accept a partial block at EOF. cp --reflink would be much harder to use if it did not. On Mon, Mar 02, 2015 at 08:59:11PM +0000, Matt Robinson wrote: > Hi David, > > Have you had a chance to look at this? Am very happy to answer > further questions, adjust my implementation, provide a different kind > of test case, etc. > > Many Thanks, > > Matt > > On 28 January 2015 at 19:46, Matt Robinson <git@nerdoftheherd.com> wrote: > > On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote: > >> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: > >>> It is not currently possible to deduplicate the last block of files > >>> whose size is not a multiple of the block size, as the btrfs_extent_same > >>> ioctl returns -EINVAL if offset + size is greater than the file size or > >>> is not aligned to the fs block size. > >> > >> Do you have a reproducer for that? > > > > I've been using the (quick and dirty) bash script at the end of this > > mail which uses btrfs-extent-same from > > https://github.com/markfasheh/duperemove/ to call the ioctl. To > > summarize: it creates a new filesystem, creates a file with a size > > which is not a multiple of the block size, copies it, and then calls > > the ioctl to ask firstly for all of the complete blocks (for > > comparison) and then the entire files to be deduplicated. > > > > Running the script under a kernel compiled from > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives > > a status of -22 from the second btrfs-extent-same call and the final > > btrfs filesystem df shows: > > Data, single: total=8.00MiB, used=1.91MiB > > > > However, running under the same kernel plus my patch shows this final > > data usage: > > Data, single: total=8.00MiB, used=980.00KiB > > > >> The alignment is required to let btrfs_clone and the extent dropping > >> functions to work. [...] > > > > Which is why it is currently not possible to deduplicate a final > > incomplete block of a file: > > * Passing len + offset = the actual end of the file: Rejected as it is > > not aligned > > * Passing len + offset = the end of the block: Rejected as it exceeds > > the actual end of the file. > > > > Please let me know if you need any further information, if my > > implementation should be different or there is a better way I could > > demonstrate the issue? > > > > Many Thanks, > > > > Matt > > > > --- > > > > #!/bin/bash -e > > > > if [[ $EUID -ne 0 ]]; then > > echo "This script must be run as root" > > exit 1 > > fi > > > > loopback=$(losetup -f) > > > > echo "## Create new btrfs filesystem on a loopback device" > > dd if=/dev/zero of=testfs bs=1048576 count=1500 > > losetup $loopback testfs > > mkfs.btrfs $loopback > > mkdir testfsmnt > > mount $loopback testfsmnt > > > > echo -e "\n## Create 1000000 byte random file" > > dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1 > > echo > > btrfs filesystem sync testfsmnt > > btrfs filesystem df testfsmnt > > > > echo -e "\n## Copy file" > > cp testfsmnt/test1 testfsmnt/test2 > > echo > > btrfs filesystem sync testfsmnt > > btrfs filesystem df testfsmnt > > > > echo -e "\n## Dedupe to end of last full block" > > btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0 > > echo > > btrfs filesystem sync testfsmnt > > btrfs filesystem df testfsmnt > > > > echo -e "\n## Dedupe to end of file" > > btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0 > > echo > > btrfs filesystem sync testfsmnt > > btrfs filesystem df testfsmnt > > > > echo -e "\nClean up" > > umount testfsmnt > > rmdir testfsmnt > > losetup -d $loopback > > rm testfs > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi All, As David hasn't got back to me I'm guessing that he is too busy with other things at present. If anyone else is able to spare the time to review my patch and give me feedback that would be very much appreciated. Many Thanks, Matt On 3 March 2015 at 00:27, Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > I second this. I've seen the same behavior. > > Clone seems to have evolved a little further than extent-same knows about. > e.g. there is code in the extent-same ioctl that tries to avoid doing > a clone from within one inode to elsewhere in the same inode; however, > the clone ioctl (which extent-same calls) has no such restriction. > > As Matt mentioned, clone_range seems quite happy to accept a partial block > at EOF. cp --reflink would be much harder to use if it did not. > > On Mon, Mar 02, 2015 at 08:59:11PM +0000, Matt Robinson wrote: > > Hi David, > > > > Have you had a chance to look at this? Am very happy to answer > > further questions, adjust my implementation, provide a different kind > > of test case, etc. > > > > Many Thanks, > > > > Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d49fe8a..a407d8a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2871,14 +2871,16 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst, return ret; } -static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len) +static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len, + u64 len_aligned) { u64 bs = BTRFS_I(inode)->root->fs_info->sb->s_blocksize; if (off + len > inode->i_size || off + len < off) return -EINVAL; + /* Check that we are block aligned - btrfs_clone() requires this */ - if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs)) + if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs)) return -EINVAL; return 0; @@ -2888,6 +2890,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, struct inode *dst, u64 dst_loff) { int ret; + u64 len_aligned = len; + u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize; /* * btrfs_clone() can't handle extents in the same file @@ -2899,11 +2903,15 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, btrfs_double_lock(src, loff, dst, dst_loff, len); - ret = extent_same_check_offsets(src, loff, len); + /* if we extend to both eofs, continue to block boundaries */ + if (loff + len == src->i_size && dst_loff + len == dst->i_size) + len_aligned = ALIGN(src->i_size, bs) - loff; + + ret = extent_same_check_offsets(src, loff, len, len_aligned); if (ret) goto out_unlock; - ret = extent_same_check_offsets(dst, dst_loff, len); + ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned); if (ret) goto out_unlock; @@ -2916,7 +2924,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, ret = btrfs_cmp_data(src, loff, dst, dst_loff, len); if (ret == 0) - ret = btrfs_clone(src, dst, loff, len, len, dst_loff); + ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff); out_unlock: btrfs_double_unlock(src, loff, dst, dst_loff, len); @@ -3162,8 +3170,7 @@ static void clone_update_extent_map(struct inode *inode, * @inode: Inode to clone to * @off: Offset within source to start clone from * @olen: Original length, passed by user, of range to clone - * @olen_aligned: Block-aligned value of olen, extent_same uses - * identical values here + * @olen_aligned: Block-aligned value of olen * @destoff: Offset within @inode to start clone */ static int btrfs_clone(struct inode *src, struct inode *inode,
It is not currently possible to deduplicate the last block of files whose size is not a multiple of the block size, as the btrfs_extent_same ioctl returns -EINVAL if offset + size is greater than the file size or is not aligned to the fs block size. This prevents btrfs from freeing up the last extent in the file, causing gains from deduplication to be smaller than expected. To resolve this, allow unaligned offset + length values to be passed to btrfs_ioctl_file_extent_same if offset + length = file size for both src and dest. This is implemented in the same way as in btrfs_ioctl_clone. Signed-off-by: Matt Robinson <git@nerdoftheherd.com> --- fs/btrfs/ioctl.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)