From patchwork Mon Jun 21 18:00:13 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 107224 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.4/8.14.3) with ESMTP id o5LI49m8006385 for ; Mon, 21 Jun 2010 18:04:09 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758181Ab0FUSD1 (ORCPT ); Mon, 21 Jun 2010 14:03:27 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:16487 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756444Ab0FUSDQ (ORCPT ); Mon, 21 Jun 2010 14:03:16 -0400 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id o5LI2DTe006650 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 21 Jun 2010 18:02:14 GMT Received: from acsmt355.oracle.com (acsmt355.oracle.com [141.146.40.155]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o5LHBClK004674; Mon, 21 Jun 2010 18:02:10 GMT Received: from abhmt013.oracle.com by acsmt353.oracle.com with ESMTP id 344337021277143216; Mon, 21 Jun 2010 11:00:16 -0700 Received: from localhost (/66.66.148.185) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 21 Jun 2010 11:00:15 -0700 Date: Mon, 21 Jun 2010 14:00:13 -0400 From: Chris Mason To: Edward Shishkin , Jamie Lokier , Edward Shishkin , Mat , LKML , linux-fsdevel@vger.kernel.org, Ric Wheeler , Andrew Morton , Linus Torvalds , The development of BTRFS Subject: Re: Balancing leaves when walking from top to down (was Btrfs:...) Message-ID: <20100621180013.GD17979@think> Mail-Followup-To: Chris Mason , Edward Shishkin , Jamie Lokier , Edward Shishkin , Mat , LKML , linux-fsdevel@vger.kernel.org, Ric Wheeler , Andrew Morton , Linus Torvalds , The development of BTRFS References: <4C07C321.8010000@redhat.com> <4C1B7560.1000806@gmail.com> <20100618155653.GC10919@shareable.org> <4C1BC924.30604@redhat.com> <20100618193555.GV27466@think> <4C1BED56.9010300@redhat.com> <20100621131528.GA17979@think> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100621131528.GA17979@think> User-Agent: Mutt/1.5.20 (2009-06-14) X-Auth-Type: Internal IP X-Source-IP: acsinet15.oracle.com [141.146.126.227] X-CT-RefId: str=0001.0A090206.4C1FA927.0165:SCFMA922111,ss=1,fgs=0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Mon, 21 Jun 2010 18:04:09 +0000 (UTC) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 0d1d966..1f393b0 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2304,12 +2304,17 @@ noinline int btrfs_leaf_free_space(struct btrfs_root *root, return ret; } +/* + * min slot controls the lowest index we're willing to push to the + * right. We'll push up to and including min_slot, but no lower + */ static noinline int __push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int data_size, int empty, struct extent_buffer *right, - int free_space, u32 left_nritems) + int free_space, u32 left_nritems, + u32 min_slot) { struct extent_buffer *left = path->nodes[0]; struct extent_buffer *upper = path->nodes[1]; @@ -2327,7 +2332,7 @@ static noinline int __push_leaf_right(struct btrfs_trans_handle *trans, if (empty) nr = 0; else - nr = 1; + nr = max_t(u32, 1, min_slot); if (path->slots[0] >= left_nritems) push_space += data_size; @@ -2469,10 +2474,13 @@ out_unlock: * * returns 1 if the push failed because the other node didn't have enough * room, 0 if everything worked out and < 0 if there were major errors. + * + * this will push starting from min_slot to the end of the leaf. It won't + * push any slot lower than min_slot */ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int data_size, - int empty) + int empty, u32 min_slot) { struct extent_buffer *left = path->nodes[0]; struct extent_buffer *right; @@ -2515,7 +2523,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root goto out_unlock; return __push_leaf_right(trans, root, path, data_size, empty, - right, free_space, left_nritems); + right, free_space, left_nritems, min_slot); out_unlock: btrfs_tree_unlock(right); free_extent_buffer(right); @@ -2525,12 +2533,17 @@ out_unlock: /* * push some data in the path leaf to the left, trying to free up at * least data_size bytes. returns zero if the push worked, nonzero otherwise + * + * max_slot can put a limit on how far into the leaf we'll push items. The + * item at 'max_slot' won't be touched. Use (u32)-1 to make us do all the + * items */ static noinline int __push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int data_size, int empty, struct extent_buffer *left, - int free_space, int right_nritems) + int free_space, u32 right_nritems, + u32 max_slot) { struct btrfs_disk_key disk_key; struct extent_buffer *right = path->nodes[0]; @@ -2549,9 +2562,9 @@ static noinline int __push_leaf_left(struct btrfs_trans_handle *trans, slot = path->slots[1]; if (empty) - nr = right_nritems; + nr = min(right_nritems, max_slot); else - nr = right_nritems - 1; + nr = min(right_nritems - 1, max_slot); for (i = 0; i < nr; i++) { item = btrfs_item_nr(right, i); @@ -2712,10 +2725,14 @@ out: /* * push some data in the path leaf to the left, trying to free up at * least data_size bytes. returns zero if the push worked, nonzero otherwise + * + * max_slot can put a limit on how far into the leaf we'll push items. The + * item at 'max_slot' won't be touched. Use (u32)-1 to make us push all the + * items */ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int data_size, - int empty) + int empty, u32 max_slot) { struct extent_buffer *right = path->nodes[0]; struct extent_buffer *left; @@ -2762,7 +2779,8 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root } return __push_leaf_left(trans, root, path, data_size, - empty, left, free_space, right_nritems); + empty, left, free_space, right_nritems, + max_slot); out: btrfs_tree_unlock(left); free_extent_buffer(left); @@ -2855,6 +2873,59 @@ static noinline int copy_for_split(struct btrfs_trans_handle *trans, } /* + * double splits happen when we need to insert a big item in the middle + * of a leaf. A double split can leave us with 3 mostly empty leaves: + * leaf: [ slots 0 - N] [ our target ] [ N + 1 - total in leaf ] + * A B C + * + * We avoid this by trying to push the items on either side of our target + * into the adjacent leaves. If all goes well we can avoid the double split + * completely. + */ +static noinline int push_for_double_split(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_path *path) +{ + int ret; + int progress = 0; + int slot; + + slot = path->slots[0]; + + /* + * try to push all the items after our slot into the + * right leaf + */ + ret = push_leaf_right(trans, root, path, 1, 0, slot); + if (ret < 0) + return ret; + + if (ret == 0) + progress++; + + /* + * our goal is to get our slot at the start or end of a leaf. If + * we've done so we're done + */ + if (path->slots[0] == 0 || + path->slots[0] == btrfs_header_nritems(path->nodes[0])) + return 0; + + /* try to push all the items before our slot into the next leaf */ + slot = path->slots[0]; + ret = push_leaf_left(trans, root, path, 1, 0, slot); + if (ret < 0) + return ret; + + if (ret == 0) + progress++; + + if (progress) + return 0; + return 1; +} + +/* * split the path's leaf in two, making sure there is at least data_size * available for the resulting leaf level of the path. * @@ -2876,6 +2947,7 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans, int wret; int split; int num_doubles = 0; + int tried_avoid_double = 0; l = path->nodes[0]; slot = path->slots[0]; @@ -2884,12 +2956,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans, return -EOVERFLOW; /* first try to make some room by pushing left and right */ - if (data_size && ins_key->type != BTRFS_DIR_ITEM_KEY) { - wret = push_leaf_right(trans, root, path, data_size, 0); + if (data_size) { + wret = push_leaf_right(trans, root, path, data_size, 0, 0); if (wret < 0) return wret; if (wret) { - wret = push_leaf_left(trans, root, path, data_size, 0); + wret = push_leaf_left(trans, root, path, + data_size, 0, (u32)-1); if (wret < 0) return wret; } @@ -2923,6 +2996,12 @@ again: if (mid != nritems && leaf_space_used(l, mid, nritems - mid) + data_size > BTRFS_LEAF_DATA_SIZE(root)) { + if (!tried_avoid_double) { + push_for_double_split(trans, + root, path); + tried_avoid_double = 1; + goto again; + } split = 2; } } @@ -2939,6 +3018,12 @@ again: if (mid != nritems && leaf_space_used(l, mid, nritems - mid) + data_size > BTRFS_LEAF_DATA_SIZE(root)) { + if (!tried_avoid_double) { + push_for_double_split(trans, + root, path); + tried_avoid_double = 1; + goto again; + } split = 2 ; } } @@ -3915,13 +4000,14 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, extent_buffer_get(leaf); btrfs_set_path_blocking(path); - wret = push_leaf_left(trans, root, path, 1, 1); + wret = push_leaf_left(trans, root, path, 1, 1, (u32)-1); if (wret < 0 && wret != -ENOSPC) ret = wret; if (path->nodes[0] == leaf && btrfs_header_nritems(leaf)) { - wret = push_leaf_right(trans, root, path, 1, 1); + wret = push_leaf_right(trans, root, path, + 1, 1, 0); if (wret < 0 && wret != -ENOSPC) ret = wret; }