diff mbox

[PROGS] Import btrfs-extent-same

Message ID 51CB6D5B.70302@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gabriel de Perthuis June 26, 2013, 10:38 p.m. UTC
Originally from
https://github.com/markfasheh/duperemove/blob/master/btrfs-extent-same.c

Signed-off-by: Gabriel de Perthuis <g2p.code+btrfs@gmail.com>
---
 .gitignore          |   1 +
 Makefile            |   2 +-
 btrfs-extent-same.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 btrfs-extent-same.c

Comments

David Sterba Aug. 6, 2013, 3:31 p.m. UTC | #1
On Thu, Jun 27, 2013 at 12:38:19AM +0200, Gabriel de Perthuis wrote:
> Originally from
> https://github.com/markfasheh/duperemove/blob/master/btrfs-extent-same.c

Can you (or Mark) please turn it into a subcommand of dedup? The idea is
to merge both in-bound and out-bound dedup into one command, eg.

 btrfs dedup files dir/*

Liu Bo's dedup patch is in today's integration snapshot
http://repo.or.cz/w/btrfs-progs-unstable/devel.git/shortlog/refs/heads/integration-20130806-2

you can use it as a base.

thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Fasheh Aug. 13, 2013, 7:35 p.m. UTC | #2
Hey Dave,

On Tue, Aug 06, 2013 at 05:31:12PM +0200, David Sterba wrote:
> On Thu, Jun 27, 2013 at 12:38:19AM +0200, Gabriel de Perthuis wrote:
> > Originally from
> > https://github.com/markfasheh/duperemove/blob/master/btrfs-extent-same.c
> 
> Can you (or Mark) please turn it into a subcommand of dedup? The idea is
> to merge both in-bound and out-bound dedup into one command, eg.
> 
>  btrfs dedup files dir/*

I can handle this - it's pretty easy but we have to talk about what we're
expecting here.

Specifically, the btrfs-extent-same.c software is just a wrapper around the
ioctl. It never does actual reads of the files or data comparisons, etc. The
syntax you describe above seems like it wants an actual "scan these files
and dedupe them" which is a whole other ball game.

So I would suggest maybe something like the syhntax of btrfs-extent-same.c:

btrfs dedupe files len file1 loff1 file2 loff2 ...

Sound reasonable?
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 2, 2013, 4:43 p.m. UTC | #3
On Tue, Aug 13, 2013 at 12:35:15PM -0700, Mark Fasheh wrote:
> > Can you (or Mark) please turn it into a subcommand of dedup? The idea is
> > to merge both in-bound and out-bound dedup into one command, eg.
> > 
> >  btrfs dedup files dir/*
> 
> I can handle this - it's pretty easy but we have to talk about what we're
> expecting here.

> Specifically, the btrfs-extent-same.c software is just a wrapper around the
> ioctl. It never does actual reads of the files or data comparisons, etc. The
> syntax you describe above seems like it wants an actual "scan these files
> and dedupe them" which is a whole other ball game.

Yes that's what I meant and that's what dupremove in your git tree does,
right?

Using the EXTENT_SAME ioctl without any checks is dangerous, and that's
what btrfs-extent-same.c does, so it's suitable for testing but not
about to be given to users as-is.

> So I would suggest maybe something like the syhntax of btrfs-extent-same.c:
> 
> btrfs dedupe files len file1 loff1 file2 loff2 ...

I'm not sure I see what 'len' means here, length of the dedup block?

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 3, 2013, 1:28 p.m. UTC | #4
On Mon, Sep 02, 2013 at 05:53:42PM +0100, Simon Farnsworth wrote:
> On Monday 2 September 2013 18:43:58 David Sterba wrote:
> > Yes that's what I meant and that's what dupremove in your git tree does,
> > right?
> > 
> > Using the EXTENT_SAME ioctl without any checks is dangerous, and that's
> > what btrfs-extent-same.c does, so it's suitable for testing but not
> > about to be given to users as-is.
> > 
> Why is using EXTENT_SAME without any checks dangerous?
>
> If userspace has to do checks to guarantee safety, what stops an attacker
> deliberately triggering a TOCTTOU race against a checked user of EXTENT_SAME?
> I would expect that unchecked use of EXTENT_SAME simply causes the kernel to
> return BTRFS_SAME_DATA_DIFFERS most of the time, thus slowing you down.

I was mistaken, the ioctl does checks before merging the extents. Sorry
for confusion.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 3, 2013, 1:41 p.m. UTC | #5
On Mon, Sep 02, 2013 at 06:43:58PM +0200, David Sterba wrote:
> > So I would suggest maybe something like the syhntax of btrfs-extent-same.c:
> > 
> > btrfs dedupe files len file1 loff1 file2 loff2 ...
> 
> I'm not sure I see what 'len' means here, length of the dedup block?

Now I'm reading more carefully, the arguments are the same as for
btrfs-extent-same that does only the simple task of deduping just one
extent, but that's not the point behind 'btrfs dedup files *'.

So there are 2 usecases:

1 - give it a bunch of files and try to dedup as much as possible among
    their data
2 - what btrfs-extent-same does, dedup just a specified range in 2 files

I'm not sure if #2 is going to be used widely though it would bring some
flexibility and fine tuning, besides testing purposes.

I think it would be good to keep both modes under one command, so it's a
matter of a sane UI.

#2 would look like:

$ btrfs dedup files --length 4096 --src-offset 0 --dest-offset 4096 file1 file2

and fail if != 2 files are given

#1 :

$ btrfs dedup files --min-length 65536 file1 file2 file3 ...

I think we could come up with more hints like 'min-length' based on user
requirements, but I'd like to get some agreement if this is the way to
go.

thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/.gitignore b/.gitignore
index a7e1b19..8f0ec54 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,10 +5,11 @@  version.h
 version
 man/*.gz
 btrfs
 btrfs.static
 btrfs-debug-tree
+btrfs-extent-same
 btrfs-map-logical
 btrfs-fragments
 btrfsck
 calc-size
 ioctl-test
diff --git a/Makefile b/Makefile
index da7438e..35cc502 100644
--- a/Makefile
+++ b/Makefile
@@ -43,11 +43,11 @@  endif
 
 MAKEOPTS = --no-print-directory Q=$(Q)
 
 progs = mkfs.btrfs btrfs-debug-tree btrfsck \
 	btrfs btrfs-map-logical btrfs-image btrfs-zero-log btrfs-convert \
-	btrfs-find-root btrfstune btrfs-show-super
+	btrfs-find-root btrfstune btrfs-show-super btrfs-extent-same
 
 # external libs required by various binaries; for btrfs-foo,
 # specify btrfs_foo_libs = <list of libs>; see $($(subst...)) rules below
 btrfs_convert_libs = -lext2fs -lcom_err
 btrfs_image_libs = -lpthread
diff --git a/btrfs-extent-same.c b/btrfs-extent-same.c
new file mode 100644
index 0000000..acf46b7
--- /dev/null
+++ b/btrfs-extent-same.c
@@ -0,0 +1,145 @@ 
+/*
+ * btrfs-extent-same.c
+ *
+ * Copyright (C) 2013 SUSE.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <errno.h>
+#include <string.h>
+#include <sys/ioctl.h>
+
+#define BTRFS_IOCTL_MAGIC 0x94
+
+#define BTRFS_IOC_FILE_EXTENT_SAME _IOWR(BTRFS_IOCTL_MAGIC, 54, \
+					 struct btrfs_ioctl_same_args)
+
+#define BTRFS_SAME_DATA_DIFFERS	1
+/* For extent-same ioctl */
+struct btrfs_ioctl_same_extent_info {
+	int64_t fd;			/* in - destination file */
+	uint64_t logical_offset;	/* in - start of extent in destination */
+	uint64_t bytes_deduped;		/* out - total # of bytes we
+					 * were able to dedupe from
+					 * this file */
+	/* status of this dedupe operation:
+	 * 0 if dedup succeeds
+	 * < 0 for error
+	 * == BTRFS_SAME_DATA_DIFFERS if data differs
+	 */
+	int32_t status;			/* out - see above description */
+	uint32_t reserved;
+};
+
+struct btrfs_ioctl_same_args {
+	uint64_t logical_offset;	/* in - start of extent in source */
+	uint64_t length;		/* in - length of extent */
+	uint16_t dest_count;		/* in - total elements in info array */
+	uint16_t reserved1;		/* out - number of files that got deduped */
+	uint32_t reserved2;
+	struct btrfs_ioctl_same_extent_info info[0];
+};
+
+static void usage(const char *prog)
+{
+	printf("Usage: %s len file1 loff1 file2 loff2\n", prog);
+}
+
+int main(int argc, char **argv)
+{
+	int ret, src_fd, i, numfiles;
+	char *srcf, *destf;
+	struct btrfs_ioctl_same_args *same;
+	struct btrfs_ioctl_same_extent_info *info;
+	unsigned long long bytes = 0ULL;
+
+	if (argc < 6 || (argc % 2)) {
+		usage(argv[0]);
+		return 1;
+	}
+
+	numfiles = (argc / 2) - 2;
+
+	printf("Deduping %d total files\n", numfiles + 1);
+
+	same = calloc(1,
+		      sizeof(struct btrfs_ioctl_same_args) +
+		      sizeof(struct btrfs_ioctl_same_extent_info) * numfiles);
+	if (!same)
+		return -ENOMEM;
+
+	srcf = argv[2];
+	same->length = atoll(argv[1]);
+	same->logical_offset = atoll(argv[3]);
+	same->dest_count = numfiles;
+
+	ret = open(srcf, O_RDONLY);
+	if (ret < 0) {
+		ret = errno;
+		fprintf(stderr, "Could not open file %s: (%d) %s\n", srcf, ret,
+			strerror(ret));
+		return -ret;
+	}
+	src_fd = ret;
+
+	printf("(%llu, %llu): %s\n", (unsigned long long)same->logical_offset,
+	       (unsigned long long)same->length, srcf);
+
+	for (i = 0; i < same->dest_count; i++) {
+		destf = argv[4 + (i * 2)];
+
+		ret = open(destf, O_RDONLY);
+		if (ret < 0) {
+			ret = errno;
+			fprintf(stderr, "Could not open file %s: (%d) %s\n",
+				destf, ret, strerror(ret));
+			return -ret;
+		}
+
+		same->info[i].fd = ret;
+		same->info[i].logical_offset = atoll(argv[5 + (i * 2)]);
+		printf("(%llu, %llu): %s\n",
+		       (unsigned long long)same->info[i].logical_offset,
+		       (unsigned long long)same->length, destf);
+
+	}
+
+	ret = ioctl(src_fd, BTRFS_IOC_FILE_EXTENT_SAME, same);
+	if (ret < 0) {
+		ret = errno;
+		fprintf(stderr, "btrfs_same returned error: (%d) %s\n", ret,
+			strerror(ret));
+		return -ret;
+	}
+
+	printf("%u files asked to be deduped\n", same->dest_count);
+
+	for (i = 0; i < same->dest_count; i++) {
+		info = &same->info[i];
+
+		printf("i: %d, status: %d, bytes_deduped: %llu\n", i,
+		       info->status, (unsigned long long)info->bytes_deduped);
+
+		bytes += info->bytes_deduped;
+	}
+
+	printf("%llu total bytes deduped in this operation\n", bytes);
+
+	return ret;
+}