From patchwork Fri Sep 22 10:39:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395508 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C15E3CD4F49 for ; Fri, 22 Sep 2023 10:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233385AbjIVKjZ (ORCPT ); Fri, 22 Sep 2023 06:39:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233291AbjIVKjW (ORCPT ); Fri, 22 Sep 2023 06:39:22 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28D2EAC for ; Fri, 22 Sep 2023 03:39:15 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46D96C433C8 for ; Fri, 22 Sep 2023 10:39:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379154; bh=OtF12TdO1meoqfq3jUsxSmmhVtbD7nJwZQ9OF1vRNyI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=l7cbH0s/JFPe/prFG7r48FJcIop7CL5U+UvH7v+Gf3s0hbLQn1tboEJ2yRss6IE0e jPVQ7c1Qi2Ay0zWI7Ic33TUPezYezn3fNVZxf1q1OYDXGLgwidPViWRxkIYdh3h9sM 6NTmVVdjyVK6oW6k2DVLRSW+KPLeJjTu0wGJ6YsD/Jpq0vBUJ2HOZkq7SXyKS6pL8s MlF/ayokhsOrlS9CgPSIBYgIamUm4W54gINCwbLi+6KQeOw1tQ+nJ+1t0/BfvPi+Zm AlgmwOo7GeTQM40W9NYD69K8qGObh9F4o8HlZcNxSBr2BPDSUKUlcvAWJqPoP9lmIQ g+rkpXWPS/KTQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/8] btrfs: make extent state merges more efficient during insertions Date: Fri, 22 Sep 2023 11:39:02 +0100 Message-Id: <915e23dfc59be63ff5999ee0e4b2f35c426e3db9.1695333278.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When inserting a new extent state record into an io tree that happens to be mergeable, we currently do the following: 1) Insert the extent state record in the io tree's rbtree. This requires going down the tree to find where to insert it, and during the insertion we often need to balance the rbtree; 2) We then check if the previous node is mergeable, so we call rb_prev() to find it, which requires some looping to find the previous node; 3) If the previous node is mergeable, we adjust our node to include the range of the previous node and then delete the previous node from the rbtree, which again may need to balance the rbtree; 4) Then we check if the next node is mergeable with the node we inserted, so we call rb_next(), which requires some looping too. If the next node is indeed mergeable, we expand the range of our node to include the next node's range and then delete the next node from the rbtree, which again may need to balance the tree. So these are quite of lot of iterations and looping over the rbtree, and some of the operations may need to rebalance the rb tree. This can be made a bit more efficient by: 1) When iterating the rbtree, once we find a node that is mergeable with the node we want to insert, we can just adjust that node's range with the range of the node to insert - this avoids continuing iterating over the tree and deleting a node from the rbtree; 2) If we expand the range of a mergeable node, then we find the next or the previous node, depending on other we merged a range to the right or to the left of the node we are currently at during the iteration. This merging is as before, we find the next or previous node with rb_next() or rb_prev() and if that other node is mergeable with the current one, we adjust the range of the current node and remove the other node from the rbtree; 3) Whenever we need to insert the new extent state record it's because we don't have any extent state record in the rbtree which can be merged, so we can remove the call to merge_state() after the insertion, saving rb_next() and rb_prev() calls, which require some looping. So update the insertion function insert_state() to have this behaviour. Running dbench for 120 seconds and capturing the execution times of set_extent_bit() at pin_down_extent(), resulted in the following data (time values are in nanoseconds): Before this change: Count: 2278299 Range: 0.000 - 4003728.000; Mean: 713.436; Median: 612.000; Stddev: 3606.952 Percentiles: 90th: 1187.000; 95th: 1350.000; 99th: 1724.000 0.000 - 7.534: 5 | 7.534 - 35.418: 36 | 35.418 - 154.403: 273 | 154.403 - 662.138: 1244016 ##################################################### 662.138 - 2828.745: 1031335 ############################################ 2828.745 - 12074.102: 1395 | 12074.102 - 51525.930: 806 | 51525.930 - 219874.955: 162 | 219874.955 - 938254.688: 22 | 938254.688 - 4003728.000: 3 | After this change: Count: 2275862 Range: 0.000 - 1605175.000; Mean: 678.903; Median: 590.000; Stddev: 2149.785 Percentiles: 90th: 1105.000; 95th: 1245.000; 99th: 1590.000 0.000 - 10.219: 10 | 10.219 - 40.957: 36 | 40.957 - 155.907: 262 | 155.907 - 585.789: 1127214 #################################################### 585.789 - 2193.431: 1145134 ##################################################### 2193.431 - 8205.578: 1648 | 8205.578 - 30689.378: 1039 | 30689.378 - 114772.699: 362 | 114772.699 - 429221.537: 52 | 429221.537 - 1605175.000: 10 | Maximum duration (range), average duration, percentiles and standard deviation are all better. Signed-off-by: Filipe Manana --- fs/btrfs/extent-io-tree.c | 130 ++++++++++++++++++++++++++------------ 1 file changed, 88 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index ff8e117a1ace..dd9dd473654d 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -327,6 +327,38 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err) "locking error: extent tree was modified by another thread while locked"); } +static void merge_prev_state(struct extent_io_tree *tree, struct extent_state *state) +{ + struct extent_state *prev; + + prev = prev_state(state); + if (prev && prev->end == state->start - 1 && + prev->state == state->state) { + if (tree->inode) + btrfs_merge_delalloc_extent(tree->inode, state, prev); + state->start = prev->start; + rb_erase(&prev->rb_node, &tree->state); + RB_CLEAR_NODE(&prev->rb_node); + free_extent_state(prev); + } +} + +static void merge_next_state(struct extent_io_tree *tree, struct extent_state *state) +{ + struct extent_state *next; + + next = next_state(state); + if (next && next->start == state->end + 1 && + next->state == state->state) { + if (tree->inode) + btrfs_merge_delalloc_extent(tree->inode, state, next); + state->end = next->end; + rb_erase(&next->rb_node, &tree->state); + RB_CLEAR_NODE(&next->rb_node); + free_extent_state(next); + } +} + /* * Utility function to look for merge candidates inside a given range. Any * extents with matching state are merged together into a single extent in the @@ -338,31 +370,11 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err) */ static void merge_state(struct extent_io_tree *tree, struct extent_state *state) { - struct extent_state *other; - if (state->state & (EXTENT_LOCKED | EXTENT_BOUNDARY)) return; - other = prev_state(state); - if (other && other->end == state->start - 1 && - other->state == state->state) { - if (tree->inode) - btrfs_merge_delalloc_extent(tree->inode, state, other); - state->start = other->start; - rb_erase(&other->rb_node, &tree->state); - RB_CLEAR_NODE(&other->rb_node); - free_extent_state(other); - } - other = next_state(state); - if (other && other->start == state->end + 1 && - other->state == state->state) { - if (tree->inode) - btrfs_merge_delalloc_extent(tree->inode, state, other); - state->end = other->end; - rb_erase(&other->rb_node, &tree->state); - RB_CLEAR_NODE(&other->rb_node); - free_extent_state(other); - } + merge_prev_state(tree, state); + merge_next_state(tree, state); } static void set_state_bits(struct extent_io_tree *tree, @@ -384,19 +396,26 @@ static void set_state_bits(struct extent_io_tree *tree, * Insert an extent_state struct into the tree. 'bits' are set on the * struct before it is inserted. * - * This may return -EEXIST if the extent is already there, in which case the - * state struct is freed. + * Returns a pointer to the struct extent_state record containing the range + * requested for insertion, which may be the same as the given struct or it + * may be an existing record in the tree that was expanded to accommodate the + * requested range. In case of an extent_state different from the one that was + * given, the later can be freed or reused by the caller. + * On error it returns an error pointer. * * The tree lock is not taken internally. This is a utility function and * probably isn't what you want to call (see set/clear_extent_bit). */ -static int insert_state(struct extent_io_tree *tree, - struct extent_state *state, - u32 bits, struct extent_changeset *changeset) +static struct extent_state *insert_state(struct extent_io_tree *tree, + struct extent_state *state, + u32 bits, + struct extent_changeset *changeset) { struct rb_node **node; struct rb_node *parent = NULL; - const u64 end = state->end; + const u64 start = state->start - 1; + const u64 end = state->end + 1; + const bool try_merge = !(bits & (EXTENT_LOCKED | EXTENT_BOUNDARY)); set_state_bits(tree, state, bits, changeset); @@ -407,23 +426,40 @@ static int insert_state(struct extent_io_tree *tree, parent = *node; entry = rb_entry(parent, struct extent_state, rb_node); - if (end < entry->start) { + if (state->end < entry->start) { + if (try_merge && end == entry->start && + state->state == entry->state) { + if (tree->inode) + btrfs_merge_delalloc_extent(tree->inode, state, entry); + entry->start = state->start; + merge_prev_state(tree, entry); + state->state = 0; + return entry; + } node = &(*node)->rb_left; - } else if (end > entry->end) { + } else if (state->end > entry->end) { + if (try_merge && entry->end == start && + state->state == entry->state) { + if (tree->inode) + btrfs_merge_delalloc_extent(tree->inode, state, entry); + entry->end = state->end; + merge_next_state(tree, entry); + state->state = 0; + return entry; + } node = &(*node)->rb_right; } else { btrfs_err(tree->fs_info, "found node %llu %llu on insert of %llu %llu", - entry->start, entry->end, state->start, end); - return -EEXIST; + entry->start, entry->end, state->start, state->end); + return ERR_PTR(-EEXIST); } } rb_link_node(&state->rb_node, parent, node); rb_insert_color(&state->rb_node, &tree->state); - merge_state(tree, state); - return 0; + return state; } /* @@ -1133,6 +1169,8 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, */ if (state->start > start) { u64 this_end; + struct extent_state *inserted_state; + if (end < last_start) this_end = end; else @@ -1148,12 +1186,15 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, */ prealloc->start = start; prealloc->end = this_end; - err = insert_state(tree, prealloc, bits, changeset); - if (err) + inserted_state = insert_state(tree, prealloc, bits, changeset); + if (IS_ERR(inserted_state)) { + err = PTR_ERR(inserted_state); extent_io_tree_panic(tree, err); + } - cache_state(prealloc, cached_state); - prealloc = NULL; + cache_state(inserted_state, cached_state); + if (inserted_state == prealloc) + prealloc = NULL; start = this_end + 1; goto search_again; } @@ -1356,6 +1397,8 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, */ if (state->start > start) { u64 this_end; + struct extent_state *inserted_state; + if (end < last_start) this_end = end; else @@ -1373,11 +1416,14 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, */ prealloc->start = start; prealloc->end = this_end; - err = insert_state(tree, prealloc, bits, NULL); - if (err) + inserted_state = insert_state(tree, prealloc, bits, NULL); + if (IS_ERR(inserted_state)) { + err = PTR_ERR(inserted_state); extent_io_tree_panic(tree, err); - cache_state(prealloc, cached_state); - prealloc = NULL; + } + cache_state(inserted_state, cached_state); + if (inserted_state == prealloc) + prealloc = NULL; start = this_end + 1; goto search_again; } From patchwork Fri Sep 22 10:39:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC10ACD4F57 for ; Fri, 22 Sep 2023 10:39:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233372AbjIVKjX (ORCPT ); Fri, 22 Sep 2023 06:39:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233340AbjIVKjW (ORCPT ); Fri, 22 Sep 2023 06:39:22 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A61F99 for ; Fri, 22 Sep 2023 03:39:16 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47939C433C7 for ; Fri, 22 Sep 2023 10:39:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379155; bh=IHOYhhBVsoalHWaYkdE7cuCqFUUGSU3xPr9eqrwHQLU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=kqyufN6oDodaeSt5nti08yGwhPJtmzDtNaNsrmDo3sqZ4/AyzBkdDhBc6DmNkX53x FVnvbpgf94boIVNzZ6uBIniM7Eksa/GM7Hdol9ga15BqhVNn6tE8SE5uNpOLsaFXBv wFWo16YM10nXrX3YS/iubQ3RXfOeVEKmfZ//M/BeWliUxzhmBwP54R9EZDuvjSbyzP FcRByU35mNR1GApIbavZN9YNNSN1idzfo9jUCMZqYy+LPI9gcqpCda1QkzXoAN22lV kCycZOHCs6rA3qXy5wXgIxCWI+2jWCl3w5rk0f4gbYeuvxxS2/6Z9ajMJrR8HU9IbA cpQk3kqaHVUig== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/8] btrfs: update stale comment at extent_io_tree_release() Date: Fri, 22 Sep 2023 11:39:03 +0100 Message-Id: <1ef2084700a7940676927a5f2b0d8c58515678df.1695333278.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana There's this comment at extent_io_tree_release() when mentions io btrees, but this function is no longer used only for io btrees. Originally it was added as a static function named clear_btree_io_tree() at transaction.c, in commit 663dfbb07774 ("Btrfs: deal with convert_extent_bit errors to avoid fs corruption"), as it was used only for cleaning one of the io trees that track dirty extent buffers, the dirty_log_pages io tree of a a root and the dirty_pages io tree of a transaction. Later it was renamed and exported and now it's used to cleanup other io trees such as the allocation state io tree of a device or the cums range io tree of a log root. So remove that comment and replace it with one at the top of the function that is more complete, mentioning what the function does and that it's expected to be called only when a task is sure no one else will need to use the tree anymore, as well as there should be no locked ranges in the tree and therefore no waiters on its extent state records. Also add an assertion to check that there are no locked extent state records in the tree. Signed-off-by: Filipe Manana --- fs/btrfs/extent-io-tree.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index dd9dd473654d..1ca0827493a6 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -105,6 +105,13 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info, lockdep_set_class(&tree->lock, &file_extent_tree_class); } +/* + * Empty an io tree, removing and freeing every extent state record from the + * tree. This should be called once we are sure no other task can access the + * tree anymore, so no tree updates happen after we empty the tree and there + * aren't any waiters on any extent state record (EXTENT_LOCKED bit is never + * set on any extent state when calling this function). + */ void extent_io_tree_release(struct extent_io_tree *tree) { spin_lock(&tree->lock); @@ -122,10 +129,7 @@ void extent_io_tree_release(struct extent_io_tree *tree) state = rb_entry(node, struct extent_state, rb_node); rb_erase(&state->rb_node, &tree->state); RB_CLEAR_NODE(&state->rb_node); - /* - * btree io trees aren't supposed to have tasks waiting for - * changes in the flags of extent states ever. - */ + ASSERT(!(state->state & EXTENT_LOCKED)); ASSERT(!waitqueue_active(&state->wq)); free_extent_state(state); From patchwork Fri Sep 22 10:39:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6CA1CD4F57 for ; Fri, 22 Sep 2023 10:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233354AbjIVKj1 (ORCPT ); Fri, 22 Sep 2023 06:39:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233361AbjIVKjX (ORCPT ); Fri, 22 Sep 2023 06:39:23 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2799CC6 for ; Fri, 22 Sep 2023 03:39:17 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 479DAC433C9 for ; Fri, 22 Sep 2023 10:39:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379156; bh=6DlJ1krS+nsNbOzeaQdSWdPdqkEYQPl7zMFu4I/iN0A=; h=From:To:Subject:Date:In-Reply-To:References:From; b=cODs+yIs5Gk3zlh4BtabECZMc3eZyhljBGFV5oOvSzdha4lteXj6+4Z/mn+7bueYp kvmR8UT5HYHjViZtMsS63nxIiL1VB3+4TLk5CgeFIoHtLOBf0GPXeXtGmaWOT8CCFG uU9gFgGCxdpRGeqRJ7zZvmMXBCfuzyoRRfpon9rfllzH2ALPA4sf7oazqOJvSgGtTO Mg7t2bzY4dzJz8NjK4pXCfZO20MmBIE0n0YDpPg9Tk7Yt8s1i6++PLLThlZlqNkMpZ tZKMv1TowpuE0HLzepAgNsDtGwIBX0v/LoYrko+yiCQJatSe9prjuu8pKDSmtp4Dfe gMYA29nw/Cq3g== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/8] btrfs: make wait_extent_bit() static Date: Fri, 22 Sep 2023 11:39:04 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The function wait_extent_bit() is not used outside extent-io-tree.c so make it static. Furthermore the function doesn't have the 'btrfs_' prefix. Signed-off-by: Filipe Manana Reviewed-by: Anand Jain --- fs/btrfs/extent-io-tree.c | 4 ++-- fs/btrfs/extent-io-tree.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 1ca0827493a6..033544f79e2b 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -766,8 +766,8 @@ static void wait_on_state(struct extent_io_tree *tree, * The range [start, end] is inclusive. * The tree lock is taken by this function */ -void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, u32 bits, - struct extent_state **cached_state) +static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, + u32 bits, struct extent_state **cached_state) { struct extent_state *state; diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index 28c23a23d121..ddcc8bbf1a05 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -192,7 +192,5 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start, bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, u64 *end, u64 max_bytes, struct extent_state **cached_state); -void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, u32 bits, - struct extent_state **cached_state); #endif /* BTRFS_EXTENT_IO_TREE_H */ From patchwork Fri Sep 22 10:39:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395510 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4EDBCD4F59 for ; Fri, 22 Sep 2023 10:39:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233374AbjIVKj3 (ORCPT ); Fri, 22 Sep 2023 06:39:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233378AbjIVKjY (ORCPT ); Fri, 22 Sep 2023 06:39:24 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69E0CAC for ; Fri, 22 Sep 2023 03:39:18 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 478A8C433C7 for ; Fri, 22 Sep 2023 10:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379157; bh=ZKWdgYQcarznOQADSBDzP5RSVjSWOJeqquGTMO4djbk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Q0OI4jWls479xbH2EBO9PLrHxKohEZ8GHWMgK2Ufs6bqFJNw0q7FpZ0wkriXv7ELC cMmz8AnTJjpzNQdB3suIgNdY/I9a9wvio8gvDReQac2pH/7s0hR56ZAaFvgNXLWB+S nDUvch6oSWTok07IStBHkIquNDzcZEGwRYAEujCAZOtxy637DZ/zy2P405cLrzORvu VL95w+GzfUgH82BeyFiUkrUyPCSTjLZ4qeTvISKukvXUTxLkvhdFmHwP5e57qBKGMk HxTg0l6Sv/rBmlNe6E5d0z4u4D6ti0udVCf0VET4GjlJ/3k/Vc0fjfGEHbBCA8IYth gD16iuA2kquMQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/8] btrfs: remove pointless memory barrier from extent_io_tree_release() Date: Fri, 22 Sep 2023 11:39:05 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The memory barrier at extent_io_tree_release() is pointless because: 1) We have just called spin_lock() and that implies a memory barrier; 2) We only change the waitqueue of an extent state record while holding the tree lock - see wait_on_state() So remove the memory barrier. Signed-off-by: Filipe Manana --- fs/btrfs/extent-io-tree.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 033544f79e2b..c939c2bc88e5 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -115,12 +115,6 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info, void extent_io_tree_release(struct extent_io_tree *tree) { spin_lock(&tree->lock); - /* - * Do a single barrier for the waitqueue_active check here, the state - * of the waitqueue should not change once extent_io_tree_release is - * called. - */ - smp_mb(); while (!RB_EMPTY_ROOT(&tree->state)) { struct rb_node *node; struct extent_state *state; @@ -130,6 +124,11 @@ void extent_io_tree_release(struct extent_io_tree *tree) rb_erase(&state->rb_node, &tree->state); RB_CLEAR_NODE(&state->rb_node); ASSERT(!(state->state & EXTENT_LOCKED)); + /* + * No need for a memory barrier here, as we are holding the tree + * lock and we only change the waitqueue while holding that lock + * (see wait_on_state()). + */ ASSERT(!waitqueue_active(&state->wq)); free_extent_state(state); From patchwork Fri Sep 22 10:39:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A11D8CD4F57 for ; Fri, 22 Sep 2023 10:39:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233391AbjIVKja (ORCPT ); Fri, 22 Sep 2023 06:39:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233384AbjIVKjY (ORCPT ); Fri, 22 Sep 2023 06:39:24 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BCE5AB for ; Fri, 22 Sep 2023 03:39:19 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 481A7C433C8 for ; Fri, 22 Sep 2023 10:39:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379158; bh=CbQ1wCUvWqRzpR55VbFjyLCavn+pfmmk+nuAuyMwU6Q=; h=From:To:Subject:Date:In-Reply-To:References:From; b=t8IGL4w1lgv+qC1F4zzuXbCbkIifMdqfNQDhY8/gBI9WgDs18PGAtKqe/h+cjYx5m D7Lu7rlNypJ3RJSuI0nuJo21nnins8OiVacC881xCaqaFhDZc41dULzyWPQjFGmvsN 9NKueTM9oVvu14Wyb1Yymg7lUWI2vGwhWFjxA5nqqyYhVt2L2njcQPmp48ZXXXuOu4 Xi/43ct3TYZxntPpNNQy2vDyOgK6TGV6Ohb0HA4QJUJ+oeXDc5Widv7+KBgHms8ZSE 9TQHSDEZtlTNlgzP4j968f6K66Ny++h7HLiQuqzE9lRZiUhe+JafNTJnxISbpA7UFb aooy+b0B/bB3Q== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/8] btrfs: collapse wait_on_state() to its caller wait_extent_bit() Date: Fri, 22 Sep 2023 11:39:06 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The wait_on_state() function is very short and has a single caller, which is wait_extent_bit(), so remove the function and put its code into the caller. Signed-off-by: Filipe Manana Reviewed-by: Anand Jain --- fs/btrfs/extent-io-tree.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index c939c2bc88e5..700b84fc1588 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -127,7 +127,7 @@ void extent_io_tree_release(struct extent_io_tree *tree) /* * No need for a memory barrier here, as we are holding the tree * lock and we only change the waitqueue while holding that lock - * (see wait_on_state()). + * (see wait_extent_bit()). */ ASSERT(!waitqueue_active(&state->wq)); free_extent_state(state); @@ -747,19 +747,6 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, } -static void wait_on_state(struct extent_io_tree *tree, - struct extent_state *state) - __releases(tree->lock) - __acquires(tree->lock) -{ - DEFINE_WAIT(wait); - prepare_to_wait(&state->wq, &wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&tree->lock); - schedule(); - spin_lock(&tree->lock); - finish_wait(&state->wq, &wait); -} - /* * Wait for one or more bits to clear on a range in the state tree. * The range [start, end] is inclusive. @@ -797,9 +784,15 @@ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, goto out; if (state->state & bits) { + DEFINE_WAIT(wait); + start = state->start; refcount_inc(&state->refs); - wait_on_state(tree, state); + prepare_to_wait(&state->wq, &wait, TASK_UNINTERRUPTIBLE); + spin_unlock(&tree->lock); + schedule(); + spin_lock(&tree->lock); + finish_wait(&state->wq, &wait); free_extent_state(state); goto again; } From patchwork Fri Sep 22 10:39:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9879BCD4F58 for ; Fri, 22 Sep 2023 10:39:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233403AbjIVKjd (ORCPT ); Fri, 22 Sep 2023 06:39:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233386AbjIVKj0 (ORCPT ); Fri, 22 Sep 2023 06:39:26 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2948A99 for ; Fri, 22 Sep 2023 03:39:20 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4902EC433C7 for ; Fri, 22 Sep 2023 10:39:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379159; bh=t+4l4VYPDQPCkXT9AyDHXwkMg0t3dQTXAgbiBuxQtVw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=gUFLNN+fuZ5aqseRuX6rpVFBBhl4z0/9gfpHgnUcPewTFmeXWoDRcHOH7NtXAby3s z8xVOLLOQ6iwzIR9I/kWouq4zADkdnGb+guA9q8SK5gKBL5kggOVI6joG3lNsx/3aA TGWBcKq/g8LN18r8AZ8nc2OwYkVOOEs7eoMQALI1yOJwmPZ0z6wgUnumuMYrPq7nXt XuqzYgL1qPLAkfj0AiC5vt9Sl/hfUSI4qb6u+Hjl9zpUZwSiPkqdhAxngpQNO53myM 3q0h0shEAohCY3fQaqQitsaFMTJWyqwtviV2LyjDl/C/ScN6PlM5IxGpgupLdnptQ/ JAn/jCPo/GxYg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/8] btrfs: make extent_io_tree_release() more efficient Date: Fri, 22 Sep 2023 11:39:07 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently extent_io_tree_release() is a loop that keeps getting the first node in the io tree, using rb_first() which is a loop that gets to the leftmost node of the rbtree, and then for each node it calls rb_erase(), which often requires rebalancing the rbtree. We can make this more efficient by using rbtree_postorder_for_each_entry_safe() to free each node without having to delete it from the rbtree and without looping to get the first node. Signed-off-by: Filipe Manana --- fs/btrfs/extent-io-tree.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 700b84fc1588..f2372d6cd304 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -114,14 +114,12 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info, */ void extent_io_tree_release(struct extent_io_tree *tree) { - spin_lock(&tree->lock); - while (!RB_EMPTY_ROOT(&tree->state)) { - struct rb_node *node; - struct extent_state *state; + struct extent_state *state; + struct extent_state *tmp; - node = rb_first(&tree->state); - state = rb_entry(node, struct extent_state, rb_node); - rb_erase(&state->rb_node, &tree->state); + spin_lock(&tree->lock); + rbtree_postorder_for_each_entry_safe(state, tmp, &tree->state, rb_node) { + /* Clear node to keep free_extent_state() happy. */ RB_CLEAR_NODE(&state->rb_node); ASSERT(!(state->state & EXTENT_LOCKED)); /* @@ -131,9 +129,9 @@ void extent_io_tree_release(struct extent_io_tree *tree) */ ASSERT(!waitqueue_active(&state->wq)); free_extent_state(state); - - cond_resched_lock(&tree->lock); + cond_resched(); } + tree->state = RB_ROOT; spin_unlock(&tree->lock); } From patchwork Fri Sep 22 10:39:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D167DCD4F49 for ; Fri, 22 Sep 2023 10:39:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233392AbjIVKjf (ORCPT ); Fri, 22 Sep 2023 06:39:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233389AbjIVKj0 (ORCPT ); Fri, 22 Sep 2023 06:39:26 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BD2BAB for ; Fri, 22 Sep 2023 03:39:21 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49B92C433C9 for ; Fri, 22 Sep 2023 10:39:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379160; bh=tRyYZUOi+3DivC3JjVadhyIxvmtxptGch0WtJowd2Ow=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WIpGzFUpKTmUEyise7mtNSyCzoN1fQJ4ysNfdjTeYZ5oAoOPVaNoRXzeOe8gmazOW PLUq7SfQhvEZIvY4YXnPZaYe4BKKAYfKAUUCbXJPM6oBau3BTfhRkRyAY2MGbhAWzx wcEKGk6gMiXa1xSS+UXranGqFmVwJzLTR3xM6hBIz0FXZR+6ZXBGjxrZUWM3zyPhps AMcMssVipqK3xPKBS8YT2JRn3Yp46xk9EL/uvNkvO9cKbd63JFW57kK9Aeqr+yVUOX oo/35pm5Sl5n3DCnWeqINN60lx4LB5KmEP8TGtVCCTll9nFcHGkvaytoupOcnQ+eQ6 ssze8yBXAwjCQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 7/8] btrfs: use extent_io_tree_release() to empty dirty log pages Date: Fri, 22 Sep 2023 11:39:08 +0100 Message-Id: <459c0d25abdfecdc7c57192fa656c6abda11af31.1695333278.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When freeing a log tree, during a transaction commit, we clear its dirty log pages io tree by calling clear_extent_bits() using a range from 0 to (u64)-1. This will iterate the io tree's rbtree and call rb_erase() on each node before freeing it, which will often trigger rebalance operations on the rbtree. A better alternative it to use extent_io_tree_release(), which will not do deletions and trigger rebalances. So use extent_io_tree_release() instead of clear_extent_bits(). Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f4257be56bd3..8687c944451f 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3207,8 +3207,7 @@ static void free_log_tree(struct btrfs_trans_handle *trans, } } - clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, - EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); + extent_io_tree_release(&log->dirty_log_pages); extent_io_tree_release(&log->log_csum_range); btrfs_put_root(log); From patchwork Fri Sep 22 10:39:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13395527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0E19CD4F57 for ; Fri, 22 Sep 2023 10:39:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233399AbjIVKjh (ORCPT ); Fri, 22 Sep 2023 06:39:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233360AbjIVKj2 (ORCPT ); Fri, 22 Sep 2023 06:39:28 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29F36AF for ; Fri, 22 Sep 2023 03:39:22 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49D49C433C7 for ; Fri, 22 Sep 2023 10:39:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695379161; bh=L+/ApH5RSE4jvFBH5JV2BOMqX+OWB23yHdrzaKkxkOY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NyTrDng74vWZH9ZHhbbmK0ksWhniYVf3Cljw0EmBwbsfnu3cl1Uf8pY410RykN9mJ 4WzohSXjLh7F9yF700gBsPBi1/qChmIgDbWFC+Cr08QTmxOKhtCHqVPrTCnLJ6f1EN aRSUnZsuGJ9P3lpNaIXBk2CK0m094fDLPuX0Xp4r0FHzhzy1EOQUg3hWSgHHk6FcKl yDWFshKnzqi29IIDxcKVuULuCFK878Ddts12O5BBP8O6GzK+/pIukh3Tvzj4VUXrud Bt8pkcQMWairYInM7X176uuhETgBLW5/HX8cAwMdMz5n/hdEtrJBU6HfrMoBoYifOW 6x+b5xu0vCmvw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 8/8] btrfs: make sure we cache next state in find_first_extent_bit() Date: Fri, 22 Sep 2023 11:39:09 +0100 Message-Id: <322c9453accd0e3329331339920099328d42077f.1695333278.git.fdmanana@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently, at find_first_extent_bit(), when we are given a cached extent state that happens to have its end offset match the desired range start, we find the next extent state using that cached state, with next_state() calls, and then return it. We then try to cache that next state by calling cache_state_if_flags(), but that will not cache the state because we haven't reset *cached_state to NULL, so we end up with the cached_state unchanged, and if the caller is iterating over extent states in the io tree, its next call to find_first_extent_bit() will not use the current cached state as its end offset does not match the minimum start range offset, therefore the cached state is reset and we have to search the rbtree to find the next suitable extent state record. So fix this by resetting the cached state to NULL (and dropping our ref on it) when we have a suitable cached state and we found a next state by using next_state() starting from the cached state. This makes use cases of calling find_first_extent_bit() to go over all ranges in the io tree to do a single rbtree full search, only on the first call, and the next calls will just do next_state() (rb_next() wrapper) calls, which is more efficient. Signed-off-by: Filipe Manana --- fs/btrfs/extent-io-tree.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index f2372d6cd304..6e3645afaa38 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -877,10 +877,19 @@ bool find_first_extent_bit(struct extent_io_tree *tree, u64 start, if (state->end == start - 1 && extent_state_in_tree(state)) { while ((state = next_state(state)) != NULL) { if (state->state & bits) - goto got_it; + break; } + /* + * If we found the next extent state, clear cached_state + * so that we can cache the next extent state below and + * avoid future calls going over the same extent state + * again. If we haven't found any, clear as well since + * it's now useless. + */ free_extent_state(*cached_state); *cached_state = NULL; + if (state) + goto got_it; goto out; } free_extent_state(*cached_state);