From patchwork Wed Oct 21 01:11:22 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Timofey Titovets <nefelim4ag@gmail.com>
X-Patchwork-Id: 7453981
Return-Path: <linux-btrfs-owner@kernel.org>
X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 57D079F302
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 01:11:38 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 3219B208E3
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 01:11:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id E9C96208DD
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 01:11:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752715AbbJUBL0 (ORCPT
	<rfc822;patchwork-linux-btrfs@patchwork.kernel.org>);
	Tue, 20 Oct 2015 21:11:26 -0400
Received: from mail-wi0-f178.google.com ([209.85.212.178]:35989 "EHLO
	mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751695AbbJUBLZ (ORCPT
	<rfc822;linux-btrfs@vger.kernel.org>);
	Tue, 20 Oct 2015 21:11:25 -0400
Received: by wicfx6 with SMTP id fx6so69461064wic.1
	for <linux-btrfs@vger.kernel.org>;
	Tue, 20 Oct 2015 18:11:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20120113;
	h=to:from:subject:message-id:date:user-agent:mime-version
	:content-type;
	bh=cdUzZE86VT+WriFLC5LN8Rysz8J4HfjXB632i94eAts=;
	b=NsyN9UtBQQewcfv3HDXjtyaDZGg5oIqV9MQNpbvkdBMhY4ADihGlz4Kh+p2YLoEibV
	gHHo98yD7At7ejJfgJYCL/UP9RFfauq0W0VoBpc7ktM800L8IqJPacoTANlKsnO67479
	lLINzQqEFfOYLCUdLO3o4gy9ExmrgWdrqWY/qWyQ54Fkr7/RMMquzlE4e92CDozB0eGM
	dl/hG6h1a7sMZPm3sHtLIpS7zu0Vi5qm9qYt/3FpCiBLe5Fx3OHuRhOoKW8UuZ2hVgQP
	ij3WCN0p+RO35Ubjhw/eGZCabOZ5RnY9MIvNRiTVzzBBuqfD8Mo2XG7uJNLK9ajyHGHA
	fsVg==
X-Received: by 10.194.191.161 with SMTP id gz1mr8123229wjc.0.1445389884213;
	Tue, 20 Oct 2015 18:11:24 -0700 (PDT)
Received: from [192.168.1.30] (nat-minsk-pool-46-53-200-210.telecom.by.
	[46.53.200.210]) by smtp.gmail.com with ESMTPSA id
	bk4sm6974874wjc.1.2015.10.20.18.11.23
	for <linux-btrfs@vger.kernel.org>
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 20 Oct 2015 18:11:23 -0700 (PDT)
To: linux-btrfs@vger.kernel.org
From: Timofey Titovets <nefelim4ag@gmail.com>
Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which,
	close to disk beginning
Message-ID: <5626E63A.2090608@gmail.com>
Date: Wed, 21 Oct 2015 04:11:22 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.3.0
MIME-Version: 1.0
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD,
	T_DKIM_INVALID,
	T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

It's just a proof of concept, and i hope to see feedback/ideas/review 
about it.
---
While deduplication,
Btrfs produce extent and file fragmentation
But it's can be optimized by compute - which inode data placed a closest 
to beginning of hdd
It's allow to:
1. Performance boost on hdd (beginning of disk faster then end)
2. Make sparse only on tail of fs, what can give boost later for
balancing and resizing operations

New function:
static u64 btrfs_avg_disko(struct inode *inode,
                         const u64 off, const u64 olen_aligned);

It normalize offsets with data lengths, by represent it like offsets of 
blocks
It return average data offset of all "pagesized" blocks in given range 
for inode
Function cloned from btrfs_clone()

Changes from V1:
         Added new function which compute "normal" offset

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
  fs/btrfs/ioctl.c | 147 
++++++++++++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 145 insertions(+), 2 deletions(-)

          unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode 
*inode,
  }

  /**
+ * btrfs_avg_disko() - return avg data offset weight for inode
+ *
+ * @inode: Inode
+ * @off: Offset for computing
+ * @olen_aligned: Block-aligned len of data
+ *
+ * Computing avg address place of data, allow to heuristically
+ * determine where on the disk placed most fragment of data
+ */
+static u64 btrfs_avg_disko(struct inode *inode,
+                    const u64 off, const u64 olen_aligned)
+{
+    struct btrfs_root *root = BTRFS_I(inode)->root;
+    struct btrfs_path *path = NULL;
+    struct extent_buffer *leaf;
+    char *buf = NULL;
+    struct btrfs_key key;
+    u32 nritems;
+    int slot;
+    int no_quota;
+    double sum = 0;
+    u64 ret = 0;
+    u64 counter = 0;
+
+    buf = vmalloc(root->nodesize);
+    if (!buf)
+        return ret;
+
+    path = btrfs_alloc_path();
+    if (!path) {
+        vfree(buf);
+        return ret;
+    }
+
+    path->reada = 2;
+    /* clone data */
+    key.objectid = btrfs_ino(inode);
+    key.type = BTRFS_EXTENT_DATA_KEY;
+    key.offset = off;
+
+    while (1) {
+        u64 next_key_min_offset = key.offset + 1;
+
+        /*
+         * note the key will change type as we walk through the
+         * tree.
+         */
+        path->leave_spinning = 1;
+        ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, 
path,    0, 0);
+        if (ret < 0)
+            goto out;
+        /*
+         * First search, if no extent item that starts at offset off was
+         * found but the previous item is an extent item, it's possible
+         * it might overlap our target range, therefore process it.
+         */
+        if (key.offset == off && ret > 0 && path->slots[0] > 0) {
+            btrfs_item_key_to_cpu(path->nodes[0], &key,
+                          path->slots[0] - 1);
+            if (key.type == BTRFS_EXTENT_DATA_KEY)
+                path->slots[0]--;
+        }
+
+        nritems = btrfs_header_nritems(path->nodes[0]);
+process_slot:
+        no_quota = 1;
+        if (path->slots[0] >= nritems) {
+            ret = btrfs_next_leaf(BTRFS_I(inode)->root, path);
+            if (ret < 0)
+                goto out;
+            if (ret > 0)
+                break;
+            nritems = btrfs_header_nritems(path->nodes[0]);
+        }
+        leaf = path->nodes[0];
+        slot = path->slots[0];
+
+        btrfs_item_key_to_cpu(leaf, &key, slot);
+        if (key.type > BTRFS_EXTENT_DATA_KEY ||
+            key.objectid != btrfs_ino(inode))
+            break;
+
+        if (key.type == BTRFS_EXTENT_DATA_KEY) {
+            struct btrfs_file_extent_item *extent;
+            int type;
+            u64 disko = 0;
+            u64 diskl = 0;
+            u64 datal = 0;
+
+            extent = btrfs_item_ptr(leaf, slot,    struct 
btrfs_file_extent_item);
+            type = btrfs_file_extent_type(leaf, extent);
+            if (type == BTRFS_FILE_EXTENT_REG ||
+                type == BTRFS_FILE_EXTENT_PREALLOC) {
+                disko = btrfs_file_extent_disk_bytenr(leaf, extent);
+                diskl = btrfs_file_extent_disk_num_bytes(leaf, extent);
+                datal = btrfs_file_extent_num_bytes(leaf, extent);
+            }
+
+            /*
+             * The first search might have left us at an extent
+             * item that ends before our target range's start, can
+             * happen if we have holes and NO_HOLES feature enabled.
+             */
+            if (key.offset + datal <= off) {
+                path->slots[0]++;
+                goto process_slot;
+            } else if (key.offset >= off + olen_aligned) {
+                break;
+            }
+            next_key_min_offset = key.offset + datal;
+
+            for (;diskl >= 0; diskl -= pagesize) {
+                sum += disko+diskl;
+                counter++;
+            }
+
+            btrfs_release_path(path);
+            path->leave_spinning = 0;
+    }
+
+out:
+    ret = sum/counter;
+    btrfs_free_path(path);
+    vfree(buf);
+    return ret;
+}
+
+/**
   * btrfs_clone() - clone a range from inode file to another
   *
   * @src: Inode to clone from

From 0fcd8c8f03064b9d7ed89606f3eff65ffc53cbf5 Mon Sep 17 00:00:00 2001
From: Timofey Titovets <nefelim4ag@gmail.com>
Date: Wed, 21 Oct 2015 04:10:13 +0300
Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which
 close to disk beginning

While deduplication,
Btrfs produce extent and file fragmentation
But it's can be optimized by compute - which inode data placed a closest to beginning of hdd
It's allow to:
1. Performance boost on hdd (beginning of disk faster then end)
2. Make sparse only on tail of fs, what can give boost later for
balancing and resizing operations

New function:
static u64 btrfs_avg_disko(struct inode *inode,
                        const u64 off, const u64 olen_aligned);

It normalize offsets with data lenghts, by represent it like offsets of blocks
It return average data offset of all "pagesized" blocks in given range for inode

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/ioctl.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 145 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3e3e613..cd0ecd3 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -86,6 +86,9 @@ struct btrfs_ioctl_received_subvol_args_32 {
 #endif
 
 
+static u64 btrfs_avg_disko(struct inode *inode,
+						const u64 off, const u64 olen_aligned);
+
 static int btrfs_clone(struct inode *src, struct inode *inode,
 		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
 		       int no_time_update);
@@ -3074,8 +3077,20 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
-	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	if (ret == 0) {
+		/* prefer inode with lowest offset as source for clone*/
+		u64 src_weight;
+		u64 dst_weight;
+		src_weight = btrfs_avg_disko(src, off, olen);
+		dst_weight = btrfs_avg_disko(dest, dst_loff, olen);
+		/* if one of weight == 0 -> fallback */
+		if (dest_weight == 0)
+			src_weight = 0;
+		if (src_weight > dest_weight)
+			ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
+		else
+			ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	}
 
 	if (same_inode)
 		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode *inode,
 }
 
 /**
+ * btrfs_avg_disko() - return avg data offset weight for inode
+ *
+ * @inode: Inode
+ * @off: Offset for computing
+ * @olen_aligned: Block-aligned len of data
+ *
+ * Computing average offset of data, allow to heuristically
+ * determine where on the disk placed most percent of data fragments
+ */
+static u64 btrfs_avg_disko(struct inode *inode,
+					const u64 off, const u64 olen_aligned)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_path *path = NULL;
+	struct extent_buffer *leaf;
+	char *buf = NULL;
+	struct btrfs_key key;
+	u32 nritems;
+	int slot;
+	int no_quota;
+	double sum = 0;
+	u64 ret = 0;
+	u64 counter = 0;
+
+	buf = vmalloc(root->nodesize);
+	if (!buf)
+		return ret;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		vfree(buf);
+		return ret;
+	}
+
+	path->reada = 2;
+	/* clone data */
+	key.objectid = btrfs_ino(inode);
+	key.type = BTRFS_EXTENT_DATA_KEY;
+	key.offset = off;
+
+	while (1) {
+		u64 next_key_min_offset = key.offset + 1;
+
+		/*
+		 * note the key will change type as we walk through the
+		 * tree.
+		 */
+		path->leave_spinning = 1;
+		ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path,	0, 0);
+		if (ret < 0)
+			goto out;
+		/*
+		 * First search, if no extent item that starts at offset off was
+		 * found but the previous item is an extent item, it's possible
+		 * it might overlap our target range, therefore process it.
+		 */
+		if (key.offset == off && ret > 0 && path->slots[0] > 0) {
+			btrfs_item_key_to_cpu(path->nodes[0], &key,
+					      path->slots[0] - 1);
+			if (key.type == BTRFS_EXTENT_DATA_KEY)
+				path->slots[0]--;
+		}
+
+		nritems = btrfs_header_nritems(path->nodes[0]);
+process_slot:
+		no_quota = 1;
+		if (path->slots[0] >= nritems) {
+			ret = btrfs_next_leaf(BTRFS_I(inode)->root, path);
+			if (ret < 0)
+				goto out;
+			if (ret > 0)
+				break;
+			nritems = btrfs_header_nritems(path->nodes[0]);
+		}
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.type > BTRFS_EXTENT_DATA_KEY ||
+		    key.objectid != btrfs_ino(inode))
+			break;
+
+		if (key.type == BTRFS_EXTENT_DATA_KEY) {
+			struct btrfs_file_extent_item *extent;
+			int type;
+			u64 disko = 0;
+			u64 diskl = 0;
+			u64 datal = 0;
+
+			extent = btrfs_item_ptr(leaf, slot,	struct btrfs_file_extent_item);
+			type = btrfs_file_extent_type(leaf, extent);
+			if (type == BTRFS_FILE_EXTENT_REG ||
+			    type == BTRFS_FILE_EXTENT_PREALLOC) {
+				disko = btrfs_file_extent_disk_bytenr(leaf, extent);
+				diskl = btrfs_file_extent_disk_num_bytes(leaf, extent);
+				datal = btrfs_file_extent_num_bytes(leaf, extent);
+			}
+
+			/*
+			 * The first search might have left us at an extent
+			 * item that ends before our target range's start, can
+			 * happen if we have holes and NO_HOLES feature enabled.
+			 */
+			if (key.offset + datal <= off) {
+				path->slots[0]++;
+				goto process_slot;
+			} else if (key.offset >= off + olen_aligned) {
+				break;
+			}
+			next_key_min_offset = key.offset + datal;
+
+			for (;diskl >= 0; diskl -= pagesize) {
+				sum += disko+diskl;
+				counter++;
+			}
+
+			btrfs_release_path(path);
+			path->leave_spinning = 0;
+	}
+
+out:
+	ret = sum/counter;
+	btrfs_free_path(path);
+	vfree(buf);
+	return ret;
+}
+
+/**
  * btrfs_clone() - clone a range from inode file to another
  *
  * @src: Inode to clone from
-- 
2.6.1