diff mbox

[1/1] btrfs: Align EOF length to block in extent_same

Message ID 1422295551-25402-1-git-send-email-git@nerdoftheherd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Matt Robinson Jan. 26, 2015, 6:05 p.m. UTC
It is not currently possible to deduplicate the last block of files
whose size is not a multiple of the block size, as the btrfs_extent_same
ioctl returns -EINVAL if offset + size is greater than the file size or
is not aligned to the fs block size. This prevents btrfs from freeing up
the last extent in the file, causing gains from deduplication to be
smaller than expected.

To resolve this, allow unaligned offset + length values to be passed to
btrfs_ioctl_file_extent_same if offset + length = file size for both src
and dest.  This is implemented in the same way as in btrfs_ioctl_clone.

Signed-off-by: Matt Robinson <git@nerdoftheherd.com>
---
 fs/btrfs/ioctl.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Comments

David Sterba Jan. 28, 2015, 12:55 p.m. UTC | #1
On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote:
> It is not currently possible to deduplicate the last block of files
> whose size is not a multiple of the block size, as the btrfs_extent_same
> ioctl returns -EINVAL if offset + size is greater than the file size or
> is not aligned to the fs block size.

Do you have a reproducer for that?

The alignment is required to let btrfs_clone and the extent dropping
functions to work. The inline files are not aligned and should not be
deduplicated anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Robinson Jan. 28, 2015, 7:46 p.m. UTC | #2
On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote:
> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote:
>> It is not currently possible to deduplicate the last block of files
>> whose size is not a multiple of the block size, as the btrfs_extent_same
>> ioctl returns -EINVAL if offset + size is greater than the file size or
>> is not aligned to the fs block size.
>
> Do you have a reproducer for that?

I've been using the (quick and dirty) bash script at the end of this
mail which uses btrfs-extent-same from
https://github.com/markfasheh/duperemove/ to call the ioctl.  To
summarize: it creates a new filesystem, creates a file with a size
which is not a multiple of the block size, copies it, and then calls
the ioctl to ask firstly for all of the complete blocks (for
comparison) and then the entire files to be deduplicated.

Running the script under a kernel compiled from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives
a status of -22 from the second btrfs-extent-same call and the final
btrfs filesystem df shows:
Data, single: total=8.00MiB, used=1.91MiB

However, running under the same kernel plus my patch shows this final
data usage:
Data, single: total=8.00MiB, used=980.00KiB

> The alignment is required to let btrfs_clone and the extent dropping
> functions to work. [...]

Which is why it is currently not possible to deduplicate a final
incomplete block of a file:
* Passing len + offset = the actual end of the file: Rejected as it is
not aligned
* Passing len + offset = the end of the block: Rejected as it exceeds
the actual end of the file.

Please let me know if you need any further information, if my
implementation should be different or there is a better way I could
demonstrate the issue?

Many Thanks,

Matt

---

#!/bin/bash -e

if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root"
   exit 1
fi

loopback=$(losetup -f)

echo "## Create new btrfs filesystem on a loopback device"
dd if=/dev/zero of=testfs bs=1048576 count=1500
losetup $loopback testfs
mkfs.btrfs $loopback
mkdir testfsmnt
mount $loopback testfsmnt

echo -e "\n## Create 1000000 byte random file"
dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1
echo
btrfs filesystem sync testfsmnt
btrfs filesystem df testfsmnt

echo -e "\n## Copy file"
cp testfsmnt/test1 testfsmnt/test2
echo
btrfs filesystem sync testfsmnt
btrfs filesystem df testfsmnt

echo -e "\n## Dedupe to end of last full block"
btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0
echo
btrfs filesystem sync testfsmnt
btrfs filesystem df testfsmnt

echo -e "\n## Dedupe to end of file"
btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0
echo
btrfs filesystem sync testfsmnt
btrfs filesystem df testfsmnt

echo -e "\nClean up"
umount testfsmnt
rmdir testfsmnt
losetup -d $loopback
rm testfs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Robinson March 2, 2015, 8:59 p.m. UTC | #3
Hi David,

Have you had a chance to look at this?  Am very happy to answer
further questions, adjust my implementation, provide a different kind
of test case, etc.

Many Thanks,

Matt

On 28 January 2015 at 19:46, Matt Robinson <git@nerdoftheherd.com> wrote:
> On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote:
>> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote:
>>> It is not currently possible to deduplicate the last block of files
>>> whose size is not a multiple of the block size, as the btrfs_extent_same
>>> ioctl returns -EINVAL if offset + size is greater than the file size or
>>> is not aligned to the fs block size.
>>
>> Do you have a reproducer for that?
>
> I've been using the (quick and dirty) bash script at the end of this
> mail which uses btrfs-extent-same from
> https://github.com/markfasheh/duperemove/ to call the ioctl.  To
> summarize: it creates a new filesystem, creates a file with a size
> which is not a multiple of the block size, copies it, and then calls
> the ioctl to ask firstly for all of the complete blocks (for
> comparison) and then the entire files to be deduplicated.
>
> Running the script under a kernel compiled from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives
> a status of -22 from the second btrfs-extent-same call and the final
> btrfs filesystem df shows:
> Data, single: total=8.00MiB, used=1.91MiB
>
> However, running under the same kernel plus my patch shows this final
> data usage:
> Data, single: total=8.00MiB, used=980.00KiB
>
>> The alignment is required to let btrfs_clone and the extent dropping
>> functions to work. [...]
>
> Which is why it is currently not possible to deduplicate a final
> incomplete block of a file:
> * Passing len + offset = the actual end of the file: Rejected as it is
> not aligned
> * Passing len + offset = the end of the block: Rejected as it exceeds
> the actual end of the file.
>
> Please let me know if you need any further information, if my
> implementation should be different or there is a better way I could
> demonstrate the issue?
>
> Many Thanks,
>
> Matt
>
> ---
>
> #!/bin/bash -e
>
> if [[ $EUID -ne 0 ]]; then
>    echo "This script must be run as root"
>    exit 1
> fi
>
> loopback=$(losetup -f)
>
> echo "## Create new btrfs filesystem on a loopback device"
> dd if=/dev/zero of=testfs bs=1048576 count=1500
> losetup $loopback testfs
> mkfs.btrfs $loopback
> mkdir testfsmnt
> mount $loopback testfsmnt
>
> echo -e "\n## Create 1000000 byte random file"
> dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1
> echo
> btrfs filesystem sync testfsmnt
> btrfs filesystem df testfsmnt
>
> echo -e "\n## Copy file"
> cp testfsmnt/test1 testfsmnt/test2
> echo
> btrfs filesystem sync testfsmnt
> btrfs filesystem df testfsmnt
>
> echo -e "\n## Dedupe to end of last full block"
> btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0
> echo
> btrfs filesystem sync testfsmnt
> btrfs filesystem df testfsmnt
>
> echo -e "\n## Dedupe to end of file"
> btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0
> echo
> btrfs filesystem sync testfsmnt
> btrfs filesystem df testfsmnt
>
> echo -e "\nClean up"
> umount testfsmnt
> rmdir testfsmnt
> losetup -d $loopback
> rm testfs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zygo Blaxell March 3, 2015, 12:27 a.m. UTC | #4
I second this.  I've seen the same behavior.

Clone seems to have evolved a little further than extent-same knows about.
e.g. there is code in the extent-same ioctl that tries to avoid doing
a clone from within one inode to elsewhere in the same inode; however,
the clone ioctl (which extent-same calls) has no such restriction.

As Matt mentioned, clone_range seems quite happy to accept a partial block
at EOF.  cp --reflink would be much harder to use if it did not.

On Mon, Mar 02, 2015 at 08:59:11PM +0000, Matt Robinson wrote:
> Hi David,
> 
> Have you had a chance to look at this?  Am very happy to answer
> further questions, adjust my implementation, provide a different kind
> of test case, etc.
> 
> Many Thanks,
> 
> Matt
> 
> On 28 January 2015 at 19:46, Matt Robinson <git@nerdoftheherd.com> wrote:
> > On 28 January 2015 at 12:55, David Sterba <dsterba@suse.cz> wrote:
> >> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote:
> >>> It is not currently possible to deduplicate the last block of files
> >>> whose size is not a multiple of the block size, as the btrfs_extent_same
> >>> ioctl returns -EINVAL if offset + size is greater than the file size or
> >>> is not aligned to the fs block size.
> >>
> >> Do you have a reproducer for that?
> >
> > I've been using the (quick and dirty) bash script at the end of this
> > mail which uses btrfs-extent-same from
> > https://github.com/markfasheh/duperemove/ to call the ioctl.  To
> > summarize: it creates a new filesystem, creates a file with a size
> > which is not a multiple of the block size, copies it, and then calls
> > the ioctl to ask firstly for all of the complete blocks (for
> > comparison) and then the entire files to be deduplicated.
> >
> > Running the script under a kernel compiled from
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives
> > a status of -22 from the second btrfs-extent-same call and the final
> > btrfs filesystem df shows:
> > Data, single: total=8.00MiB, used=1.91MiB
> >
> > However, running under the same kernel plus my patch shows this final
> > data usage:
> > Data, single: total=8.00MiB, used=980.00KiB
> >
> >> The alignment is required to let btrfs_clone and the extent dropping
> >> functions to work. [...]
> >
> > Which is why it is currently not possible to deduplicate a final
> > incomplete block of a file:
> > * Passing len + offset = the actual end of the file: Rejected as it is
> > not aligned
> > * Passing len + offset = the end of the block: Rejected as it exceeds
> > the actual end of the file.
> >
> > Please let me know if you need any further information, if my
> > implementation should be different or there is a better way I could
> > demonstrate the issue?
> >
> > Many Thanks,
> >
> > Matt
> >
> > ---
> >
> > #!/bin/bash -e
> >
> > if [[ $EUID -ne 0 ]]; then
> >    echo "This script must be run as root"
> >    exit 1
> > fi
> >
> > loopback=$(losetup -f)
> >
> > echo "## Create new btrfs filesystem on a loopback device"
> > dd if=/dev/zero of=testfs bs=1048576 count=1500
> > losetup $loopback testfs
> > mkfs.btrfs $loopback
> > mkdir testfsmnt
> > mount $loopback testfsmnt
> >
> > echo -e "\n## Create 1000000 byte random file"
> > dd if=/dev/urandom of=testfsmnt/test1 bs=1000000 count=1
> > echo
> > btrfs filesystem sync testfsmnt
> > btrfs filesystem df testfsmnt
> >
> > echo -e "\n## Copy file"
> > cp testfsmnt/test1 testfsmnt/test2
> > echo
> > btrfs filesystem sync testfsmnt
> > btrfs filesystem df testfsmnt
> >
> > echo -e "\n## Dedupe to end of last full block"
> > btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0
> > echo
> > btrfs filesystem sync testfsmnt
> > btrfs filesystem df testfsmnt
> >
> > echo -e "\n## Dedupe to end of file"
> > btrfs-extent-same 1000000 testfsmnt/test1 0 testfsmnt/test2 0
> > echo
> > btrfs filesystem sync testfsmnt
> > btrfs filesystem df testfsmnt
> >
> > echo -e "\nClean up"
> > umount testfsmnt
> > rmdir testfsmnt
> > losetup -d $loopback
> > rm testfs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Robinson April 11, 2015, 7:17 p.m. UTC | #5
Hi All,

As David hasn't got back to me I'm guessing that he is too busy with
other things at present. If anyone else is able to spare the time to
review my patch and give me feedback that would be very much
appreciated.

Many Thanks,

Matt

On 3 March 2015 at 00:27, Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote:
>
> I second this.  I've seen the same behavior.
>
> Clone seems to have evolved a little further than extent-same knows about.
> e.g. there is code in the extent-same ioctl that tries to avoid doing
> a clone from within one inode to elsewhere in the same inode; however,
> the clone ioctl (which extent-same calls) has no such restriction.
>
> As Matt mentioned, clone_range seems quite happy to accept a partial block
> at EOF.  cp --reflink would be much harder to use if it did not.
>
> On Mon, Mar 02, 2015 at 08:59:11PM +0000, Matt Robinson wrote:
> > Hi David,
> >
> > Have you had a chance to look at this?  Am very happy to answer
> > further questions, adjust my implementation, provide a different kind
> > of test case, etc.
> >
> > Many Thanks,
> >
> > Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d49fe8a..a407d8a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2871,14 +2871,16 @@  static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
 	return ret;
 }
 
-static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len)
+static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len,
+				     u64 len_aligned)
 {
 	u64 bs = BTRFS_I(inode)->root->fs_info->sb->s_blocksize;
 
 	if (off + len > inode->i_size || off + len < off)
 		return -EINVAL;
+
 	/* Check that we are block aligned - btrfs_clone() requires this */
-	if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs))
+	if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs))
 		return -EINVAL;
 
 	return 0;
@@ -2888,6 +2890,8 @@  static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
 			     struct inode *dst, u64 dst_loff)
 {
 	int ret;
+	u64 len_aligned = len;
+	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
 
 	/*
 	 * btrfs_clone() can't handle extents in the same file
@@ -2899,11 +2903,15 @@  static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
 
 	btrfs_double_lock(src, loff, dst, dst_loff, len);
 
-	ret = extent_same_check_offsets(src, loff, len);
+	/* if we extend to both eofs, continue to block boundaries */
+	if (loff + len == src->i_size && dst_loff + len == dst->i_size)
+		len_aligned = ALIGN(src->i_size, bs) - loff;
+
+	ret = extent_same_check_offsets(src, loff, len, len_aligned);
 	if (ret)
 		goto out_unlock;
 
-	ret = extent_same_check_offsets(dst, dst_loff, len);
+	ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned);
 	if (ret)
 		goto out_unlock;
 
@@ -2916,7 +2924,7 @@  static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
 
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
 	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, len, len, dst_loff);
+		ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff);
 
 out_unlock:
 	btrfs_double_unlock(src, loff, dst, dst_loff, len);
@@ -3162,8 +3170,7 @@  static void clone_update_extent_map(struct inode *inode,
  * @inode: Inode to clone to
  * @off: Offset within source to start clone from
  * @olen: Original length, passed by user, of range to clone
- * @olen_aligned: Block-aligned value of olen, extent_same uses
- *               identical values here
+ * @olen_aligned: Block-aligned value of olen
  * @destoff: Offset within @inode to start clone
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,