From patchwork Tue Nov 1 16:15:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38B50C43217 for ; Tue, 1 Nov 2022 16:16:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230382AbiKAQQI (ORCPT ); Tue, 1 Nov 2022 12:16:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230291AbiKAQQF (ORCPT ); Tue, 1 Nov 2022 12:16:05 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75E9F1C431 for ; Tue, 1 Nov 2022 09:16:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3100FB81E25 for ; Tue, 1 Nov 2022 16:16:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5D55EC433C1 for ; Tue, 1 Nov 2022 16:15:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319358; bh=tDAA8mxKthMiBQmk7+y+rLW7gD/lsOxJMvLAyvkrj9s=; h=From:To:Subject:Date:In-Reply-To:References:From; b=mlJNDStjU0X9QVcPG6q9E6P6Sz4ZDElUwuLw1cHkBj9bLXHhJg3N/cZBlCIqI+aKJ gBTxRDOhR5gRo/nZYsZQ6kj0fJauVPajn4+YvIlAThHpIdI534KXqiR8mPb8AMW78L bPdtmDDuRdLHi6NGzyzVojK2KC8DwloChAs1jic6Dg5rK4ru3cGjm3OiHrLWsDNLPS QpIFnHdhkE4SLjC/IPA8d5Amrd8bC5uZbljvBPB6Upx+GGkPCq6UMwBJhJ/ydmb3du ypB+qTaJJ+6bWcHFxSklSUt3VxszkUip+VPi9UFzCWulF407VnF+6A6Ebusd0k+8Ym teLOv8nR0Ae5w== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 01/18] btrfs: fix inode list leak during backref walking at resolve_indirect_refs() Date: Tue, 1 Nov 2022 16:15:37 +0000 Message-Id: <87e27f27fb2ebf760e574f97fd2d2f7d37f50350.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During backref walking, at resolve_indirect_refs(), if we get an error we jump to the 'out' label and call ulist_free() on the 'parents' ulist, which frees all the elements in the ulist - however that does not free any inode lists that may be attached to elements, through the 'aux' field of a ulist node, so we end up leaking lists if we have any attached to the unodes. Fix this by calling free_leaf_list() instead of ulist_free() when we exit from resolve_indirect_refs(). The static function free_leaf_list() is moved up for this to be possible and it's slightly simplified by removing unnecessary code. Fixes: 3301958b7c1d ("Btrfs: add inodes before dropping the extent lock in find_all_leafs") Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 013c2c085229..82c4cad0ca00 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -678,6 +678,18 @@ unode_aux_to_inode_list(struct ulist_node *node) return (struct extent_inode_elem *)(uintptr_t)node->aux; } +static void free_leaf_list(struct ulist *ulist) +{ + struct ulist_node *node; + struct ulist_iterator uiter; + + ULIST_ITER_INIT(&uiter); + while ((node = ulist_next(ulist, &uiter))) + free_inode_elem_list(unode_aux_to_inode_list(node)); + + ulist_free(ulist); +} + /* * We maintain three separate rbtrees: one for direct refs, one for * indirect refs which have a key, and one for indirect refs which do not @@ -791,7 +803,11 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, cond_resched(); } out: - ulist_free(parents); + /* + * We may have inode lists attached to refs in the parents ulist, so we + * must free them before freeing the ulist and its refs. + */ + free_leaf_list(parents); return ret; } @@ -1621,24 +1637,6 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, return ret; } -static void free_leaf_list(struct ulist *blocks) -{ - struct ulist_node *node = NULL; - struct extent_inode_elem *eie; - struct ulist_iterator uiter; - - ULIST_ITER_INIT(&uiter); - while ((node = ulist_next(blocks, &uiter))) { - if (!node->aux) - continue; - eie = unode_aux_to_inode_list(node); - free_inode_elem_list(eie); - node->aux = 0; - } - - ulist_free(blocks); -} - /* * Finds all leafs with a reference to the specified combination of bytenr and * offset. key_list_head will point to a list of corresponding keys (caller must From patchwork Tue Nov 1 16:15:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 845B0C43217 for ; Tue, 1 Nov 2022 16:16:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230350AbiKAQQD (ORCPT ); Tue, 1 Nov 2022 12:16:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230291AbiKAQQB (ORCPT ); Tue, 1 Nov 2022 12:16:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B98F11C903 for ; Tue, 1 Nov 2022 09:16:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 560BE6118E for ; Tue, 1 Nov 2022 16:16:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43080C433B5 for ; Tue, 1 Nov 2022 16:15:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319359; bh=5ww87WV0U+VPLGbOTyOPqfdHGHsocTgBs4KfbOxmIFQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eNsBEDjprZIbTexgk1sny0COSKoxO9zCGSootJ8kXya5n4+yYW+gzR4lbkB+czrL4 421ZVE1+MTeirMC3lyML3FB9TwSdV3qmCd/ZCmCMLuNUapBcTCKucDz4lEH0Sv9rTd kphWAKrmEQcKqnJvip7ubTMHLS0/O3cDGChyfLLVcCNaXAirqAHFiDKBVega6wwEeb X4Fsr+mUrEq1twjssANcV0TMT0sX00gtQk8QHjhl6encJn5X0IH1+M33jezVoIZY+E 97eqw6KyaPuEM6BEqqn7YkFM7MFn3n5I5UXjv6CVNFulsF4IbYrM7xPsf+hk2aedzL 7pBiYeNXNuQbA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 02/18] btrfs: fix inode list leak during backref walking at find_parent_nodes() Date: Tue, 1 Nov 2022 16:15:38 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During backref walking, at find_parent_nodes(), if we are dealing with a data extent and we get an error while resolving the indirect backrefs, at resolve_indirect_refs(), or in the while loop that iterates over the refs in the direct refs rbtree, we end up leaking the inode lists attached to the direct refs we have in the direct refs rbtree that were not yet added to the refs ulist passed as argument to find_parent_nodes(). Since they were not yet added to the refs ulist and prelim_release() does not free the lists, on error the caller can only free the lists attached to the refs that were added to the refs ulist, all the remaining refs get their inode lists never freed, therefore leaking their memory. Fix this by having prelim_release() always free any attached inode list to each ref found in the rbtree, and have find_parent_nodes() set the ref's inode list to NULL once it transfers ownership of the inode list to a ref added to the refs ulist passed to find_parent_nodes(). Fixes: 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees") Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 82c4cad0ca00..04cb608e7cfb 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -319,8 +319,10 @@ static void prelim_release(struct preftree *preftree) struct prelim_ref *ref, *next_ref; rbtree_postorder_for_each_entry_safe(ref, next_ref, - &preftree->root.rb_root, rbnode) + &preftree->root.rb_root, rbnode) { + free_inode_elem_list(ref->inode_list); free_pref(ref); + } preftree->root = RB_ROOT_CACHED; preftree->count = 0; @@ -1596,6 +1598,12 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, if (ret < 0) goto out; ref->inode_list = eie; + /* + * We transferred the list ownership to the ref, + * so set to NULL to avoid a double free in case + * an error happens after this. + */ + eie = NULL; } ret = ulist_add_merge_ptr(refs, ref->parent, ref->inode_list, @@ -1621,6 +1629,14 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, eie->next = ref->inode_list; } eie = NULL; + /* + * We have transferred the inode list ownership from + * this ref to the ref we added to the 'refs' ulist. + * So set this ref's inode list to NULL to avoid + * use-after-free when our caller uses it or double + * frees in case an error happens before we return. + */ + ref->inode_list = NULL; } cond_resched(); } From patchwork Tue Nov 1 16:15:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027158 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A16EC433FE for ; Tue, 1 Nov 2022 16:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230377AbiKAQQH (ORCPT ); Tue, 1 Nov 2022 12:16:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbiKAQQF (ORCPT ); Tue, 1 Nov 2022 12:16:05 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47D851C903 for ; Tue, 1 Nov 2022 09:16:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F18C0B81D9F for ; Tue, 1 Nov 2022 16:16:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27F77C43470 for ; Tue, 1 Nov 2022 16:15:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319360; bh=sy8AWqXshzSDW/g0k8y2ZkyclJ8C551aRGLu3Z4hfYo=; h=From:To:Subject:Date:In-Reply-To:References:From; b=f6+ZXZbCwM3Gs7d10CC/P5j4wowmK31tBClG7bzcjtYT9MG1tX4Z0pJWtFNmKGNja fyaQfVUcBh67SfhfSOeqIm2r3cfS6qliCqblieGBjfkNQklkfruACSO/AHnPCoMyGc sB6IpTPrwg6rggSeQP2sLNiuNgg1/OSecmDL5u2Komfnnhve+dJSLltuI4q4oRcBGu dTr9Y0CuqJp3LnuK2gHgYEyjDPFPMh2gbfkMJpv/WdgOx7U039SIRjDCxyNdi57WGx synyuzLyMplXMk83tTBmwEAzIWkdFDfUFmuIpGLjmU7EfBk3KWOHYxcgk/IMK9QVn+ uSzxD/R12slrA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 03/18] btrfs: fix ulist leaks in error paths of qgroup self tests Date: Tue, 1 Nov 2022 16:15:39 +0000 Message-Id: <7716b3fb040152c7dcd4531a0ddc8c1abae31ac0.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests, if we fail to add the tree ref, remove the extent item or remove the extent ref, we are returning from the test function without freeing the "old_roots" ulist that was allocated by the previous calls to btrfs_find_all_roots(). Fix that by calling ulist_free() before returning. Fixes: 442244c96332 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.") Signed-off-by: Filipe Manana --- fs/btrfs/tests/qgroup-tests.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index 94b04f10f61c..96a70ce36f79 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -234,8 +234,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root, ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + } ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -268,8 +270,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root, } ret = remove_extent_item(root, nodesize, nodesize); - if (ret) + if (ret) { + ulist_free(old_roots); return -EINVAL; + } ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -331,8 +335,10 @@ static int test_multiple_refs(struct btrfs_root *root, ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + } ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -364,8 +370,10 @@ static int test_multiple_refs(struct btrfs_root *root, ret = add_tree_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + } ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -403,8 +411,10 @@ static int test_multiple_refs(struct btrfs_root *root, ret = remove_extent_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + } ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { From patchwork Tue Nov 1 16:15:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72C5BC4332F for ; Tue, 1 Nov 2022 16:16:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230361AbiKAQQG (ORCPT ); Tue, 1 Nov 2022 12:16:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230351AbiKAQQD (ORCPT ); Tue, 1 Nov 2022 12:16:03 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82D001C43C for ; Tue, 1 Nov 2022 09:16:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1E6B36118E for ; Tue, 1 Nov 2022 16:16:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D89AC433B5 for ; Tue, 1 Nov 2022 16:16:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319361; bh=XVNtSnZhdJyYdAQWZZa0Oaxass+PiA9YX0acKMDKj7U=; h=From:To:Subject:Date:In-Reply-To:References:From; b=AfiV6OLFKsktuiADrAsYVizM54DEMjXpnK63sKSKa9/bSvGg4dONsqHGjPqG+cW97 sWMQMefCe7+M8oQUxQS7Sy6tZfa8OCwlYQgZSXD12dXlBI8RiwhtbakGZGSfHNkFn+ CmlMD+K4KsY7nD9Rc04v7/uGEq12jf0bsBKxztwIR5TYl0IBjxHZizq1MeR4BOonzI /PB8hatmczEl1S7iMo81IJZHDk2yH90u1Y7m34eHMQu6Jbb9TSsnMc3IigJMJiKxi8 B5jHjn7qoL74S2Uf7S5kUgSFQCoFIwogFDGDt5CwFSe2l0UFWpOsyMBv6XxaXAnqL/ YldgJldpoRQbg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 04/18] btrfs: remove pointless and double ulist frees in error paths of qgroup tests Date: Tue, 1 Nov 2022 16:15:40 +0000 Message-Id: <7ceae212eaf5e9076153ae79914ab11d495164ad.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Several places in the qgroup self tests follow the pattern of freeing the ulist pointer they passed to btrfs_find_all_roots() if the call to that function returned an error. That is pointless because that function always frees the ulist in case it returns an error. Also In some places like at test_multiple_refs(), after a call to btrfs_qgroup_account_extent() we also leave "old_roots" and "new_roots" pointing to ulists that were freed, because btrfs_qgroup_account_extent() has freed those ulists, and if after that the next call to btrfs_find_all_roots() fails, we call ulist_free() on the "old_roots" ulist again, resulting in a double free. So remove those calls to reduce the code size and avoid double ulist free in case of an error. Signed-off-by: Filipe Manana --- fs/btrfs/tests/qgroup-tests.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index 96a70ce36f79..65b65d55d1f6 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -227,7 +227,6 @@ static int test_no_shared_qgroup(struct btrfs_root *root, */ ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { - ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -242,7 +241,6 @@ static int test_no_shared_qgroup(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); - ulist_free(new_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -254,17 +252,18 @@ static int test_no_shared_qgroup(struct btrfs_root *root, return ret; } + /* btrfs_qgroup_account_extent() always frees the ulists passed to it. */ + old_roots = NULL; + new_roots = NULL; + if (btrfs_verify_qgroup_counts(fs_info, BTRFS_FS_TREE_OBJECTID, nodesize, nodesize)) { test_err("qgroup counts didn't match expected values"); return -EINVAL; } - old_roots = NULL; - new_roots = NULL; ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { - ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -278,7 +277,6 @@ static int test_no_shared_qgroup(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); - ulist_free(new_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -328,7 +326,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { - ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -343,7 +340,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); - ulist_free(new_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -363,7 +359,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { - ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -378,7 +373,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); - ulist_free(new_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -404,7 +398,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); if (ret) { - ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } @@ -419,7 +412,6 @@ static int test_multiple_refs(struct btrfs_root *root, ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { ulist_free(old_roots); - ulist_free(new_roots); test_err("couldn't find old roots: %d", ret); return ret; } From patchwork Tue Nov 1 16:15:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C350C4332F for ; Tue, 1 Nov 2022 16:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230393AbiKAQQJ (ORCPT ); Tue, 1 Nov 2022 12:16:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230364AbiKAQQH (ORCPT ); Tue, 1 Nov 2022 12:16:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67E4E1C90E for ; Tue, 1 Nov 2022 09:16:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F28E3611DA for ; Tue, 1 Nov 2022 16:16:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6D33C4347C for ; Tue, 1 Nov 2022 16:16:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319362; bh=5ood/zz0uWAJhNxB/ehvZP6xcqECuC8Jj+qtE+otfaU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=O9OxuyxuYj14+vH2raAKOdy1BwnRmIKfUlT3HMISTaHhJl9v20H51H7RdfTq+XS5/ 6Az9GgjNX1a0mNaHUe4rG3iox48IjYIhEHRx6acsZnGm6y4j70YCCsu+cI08NzrzeL XnMIYrAwsBnsEXhYjl/xXtdDQreLjOzGWr4oWZ1bxfhmk9cnIy/d2E6JDDQ0vfTp44 60d7cqNFEqkWiwXPE/sQBQireP8VvCx7Xb79Q7lMLC/SS2moTq0IRo981BZrnuqs+u WOHg4a4sXT4jq/zxzNlSuZAkocOw6Bn9S/6GIMyWLK7Xt9Mrc8f+/LUHqUqXSULDqG 1wL5s0IsGAEyw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/18] btrfs: send: avoid unnecessary path allocations when finding extent clone Date: Tue, 1 Nov 2022 16:15:41 +0000 Message-Id: <8a45ac7ffef6caedbbee1a22bdbe1c7049091caa.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When looking for an extent clone, at find_extent_clone(), we start by allocating a path and then check for cases where we can't have clones and exit immediately in those cases. It's a waste of time to allocate the path before those cases, so reorder the logic so that we check for those cases before allocating the path. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 50063ac83830..2226296ca691 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1365,40 +1365,35 @@ static int find_extent_clone(struct send_ctx *sctx, int compressed; u32 i; - tmp_path = alloc_path_for_send(); - if (!tmp_path) - return -ENOMEM; - - /* We only use this path under the commit sem */ - tmp_path->need_commit_sem = 0; - if (data_offset >= ino_size) { /* * There may be extents that lie behind the file's size. * I at least had this in combination with snapshotting while * writing large files. */ - ret = 0; - goto out; + return 0; } - fi = btrfs_item_ptr(eb, path->slots[0], - struct btrfs_file_extent_item); + fi = btrfs_item_ptr(eb, path->slots[0], struct btrfs_file_extent_item); extent_type = btrfs_file_extent_type(eb, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = -ENOENT; - goto out; - } - compressed = btrfs_file_extent_compression(eb, fi); + if (extent_type == BTRFS_FILE_EXTENT_INLINE) + return -ENOENT; - num_bytes = btrfs_file_extent_num_bytes(eb, fi); disk_byte = btrfs_file_extent_disk_bytenr(eb, fi); - if (disk_byte == 0) { - ret = -ENOENT; - goto out; - } + if (disk_byte == 0) + return -ENOENT; + + compressed = btrfs_file_extent_compression(eb, fi); + num_bytes = btrfs_file_extent_num_bytes(eb, fi); logical = disk_byte + btrfs_file_extent_offset(eb, fi); + tmp_path = alloc_path_for_send(); + if (!tmp_path) + return -ENOMEM; + + /* We only use this path under the commit sem */ + tmp_path->need_commit_sem = 0; + down_read(&fs_info->commit_root_sem); ret = extent_from_logical(fs_info, disk_byte, tmp_path, &found_key, &flags); From patchwork Tue Nov 1 16:15:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 368E5C43217 for ; Tue, 1 Nov 2022 16:16:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230383AbiKAQQK (ORCPT ); Tue, 1 Nov 2022 12:16:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230370AbiKAQQH (ORCPT ); Tue, 1 Nov 2022 12:16:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4478C1C90D for ; Tue, 1 Nov 2022 09:16:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D56A06118E for ; Tue, 1 Nov 2022 16:16:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC322C433C1 for ; Tue, 1 Nov 2022 16:16:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319363; bh=Qtx23bpKTz2je904upFct8NcI+N14XP9/xcmRzw/M08=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Tf/1mNqftqIHbxJ6R4ac+LV10hhrlZdJoYbJMUPJJG5yAbQXem+ds37gY+3dLnS5x dMufFZ/VMDw8cKD0Zb+8HoI0da/J+kpSM5TTKvoMC72EKJBFgT5XpDwG0QEqL34gEb J5xV+JiXYuAFEqlQPDd00q9HJB4+nUaxmNYmxUfZnrRfpxSorJwvQwF53Z1w/I6ZOm GTvyBAjzUMa+uBA8CxvT/eTF3etpMkxiCZ2MCj/NGq8va1r/xsw1/s5JbsO0DgjVXA Bmiz6K2ZxaYOQOaLsMiB5d/A16l/GQHL1XphWYaNglDVEp/pJJqdWRnhqAXeIkAvdK 8qmlLnKu2iPhw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 06/18] btrfs: send: update comment at find_extent_clone() Date: Tue, 1 Nov 2022 16:15:42 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana We have this unclear comment at find_extent_clone() about extents starting at a file offset greater than or equals to the i_size of the inode. It's not really informative and it's misleading, since it mentions the author found such extents with snapshots and large files. Such extents are a result of fallocate with FALLOC_FL_KEEP_SIZE and there is no relation to snapshots or large files (all write paths update the i_size before inserting a new file extent item). So update the comment to be precise about it and why we don't bother looking for clone sources in that case. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 2226296ca691..63f2ac33e85b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1365,14 +1365,14 @@ static int find_extent_clone(struct send_ctx *sctx, int compressed; u32 i; - if (data_offset >= ino_size) { - /* - * There may be extents that lie behind the file's size. - * I at least had this in combination with snapshotting while - * writing large files. - */ + /* + * With fallocate we can get prealloc extents beyond the inode's i_size, + * so we don't do anything here because clone operations can not clone + * to a range beyond i_size without increasing the i_size of the + * destination inode. + */ + if (data_offset >= ino_size) return 0; - } fi = btrfs_item_ptr(eb, path->slots[0], struct btrfs_file_extent_item); extent_type = btrfs_file_extent_type(eb, fi); From patchwork Tue Nov 1 16:15:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027160 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0903C4321E for ; Tue, 1 Nov 2022 16:16:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230291AbiKAQQI (ORCPT ); Tue, 1 Nov 2022 12:16:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230357AbiKAQQF (ORCPT ); Tue, 1 Nov 2022 12:16:05 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FBD61C43C for ; Tue, 1 Nov 2022 09:16:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C0DB761668 for ; Tue, 1 Nov 2022 16:16:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B163AC433B5 for ; Tue, 1 Nov 2022 16:16:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319364; bh=3K3P9e8w5JnraVT1Fw84V2xWefAooN0iMB8CHdOJDYw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=uTjggoyS60PUZ5NjtCrZ2lAvWjBs5/sI98JX5QHOGaKNWGi2Ko+IswYQABF6rodCb WT3nfG8Y36k5ZCT+fFGcBJBtLC93LJHNW5vtgHvq3+wWgR0pv5llfQ0h4huZc+zOc9 7t7lIkRSQEgMtkdiSypuB4qiEHSgZw2S51pjFfE6YqdILhEU/mQccw81MpUGSN+k62 Pgh67Bt1HUUefO7kd2x3blcQ3QwbdZuNwSXmPWg0eFalDBbi3oCPxhQD8x3NKsXxdy 6DsTMRcai3ITOUGmcMOAwSQ1nB9YvKVH+Yjbwh/L94SlHWah3EzmkfFALN8IvukVQm fIXEg/fq4cbCA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 07/18] btrfs: send: drop unnecessary backref context field initializations Date: Tue, 1 Nov 2022 16:15:43 +0000 Message-Id: <1f57dc1e48aa4b4fe4701e72963353d1ac4e4d52.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At find_extent_clone() we are initializing to zero the 'found_itself' and 'found' fields of the backref context before we use it but we have already initialized the structure to zeroes when we declared it on stack, so it's pointless to initialize those fields and they are unnecessarily increasing the object text size with two "mov" instructions (x86_64). Similarly make the 'extent_len' initialization more clear by using an if- -then-else instead of a double assignment to it in case the extent's end crosses the i_size boundary. Before this change: $ size fs/btrfs/send.o text data bss dec hex filename 68694 4252 16 72962 11d02 fs/btrfs/send.o After this change: $ size fs/btrfs/send.o text data bss dec hex filename 68678 4252 16 72946 11cf2 fs/btrfs/send.o Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 63f2ac33e85b..61496d3e1355 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1432,11 +1432,8 @@ static int find_extent_clone(struct send_ctx *sctx, } backref_ctx.sctx = sctx; - backref_ctx.found = 0; backref_ctx.cur_objectid = ino; backref_ctx.cur_offset = data_offset; - backref_ctx.found_itself = 0; - backref_ctx.extent_len = num_bytes; /* * The last extent of a file may be too large due to page alignment. @@ -1445,6 +1442,8 @@ static int find_extent_clone(struct send_ctx *sctx, */ if (data_offset + num_bytes >= ino_size) backref_ctx.extent_len = ino_size - data_offset; + else + backref_ctx.extent_len = num_bytes; /* * Now collect all backrefs. From patchwork Tue Nov 1 16:15:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE565C43219 for ; Tue, 1 Nov 2022 16:16:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230394AbiKAQQK (ORCPT ); Tue, 1 Nov 2022 12:16:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230384AbiKAQQI (ORCPT ); Tue, 1 Nov 2022 12:16:08 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C68F1C919 for ; Tue, 1 Nov 2022 09:16:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A51C96166F for ; Tue, 1 Nov 2022 16:16:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97567C433C1 for ; Tue, 1 Nov 2022 16:16:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319365; bh=kxHUVuPlHxXy2xDMayvGUUmQn57kpjF4kaPqZJaQrjc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Zbi4CH3eLFxxR+MPkjYnkegwc+GGVn/f0eZ0J31npOhvtJESFMJWencCV0kC77mC5 V3YZoIJj1Jm9XZRg1IUSkcdUcO1KKRlGalNmRTL9O7n9QXkx8f+wkoSZI2CK4fUsQj y+toPmFkkbJJI/OVWHG9xnQLeq9gdOM6Yg5a1MCHcUdsVOk/JNha26Ks997M5+Rlgi h2H0ML0753mOss8Y2StnbTULw6l0Y3SFRYvvPy7GaOXOEwdLa3v6kOuZVrxLoRzudr ebhWSYGNv77FRlu1kbYM+5fWk1d5bZkkv7GkMWLz+9fskrRpuQCKRmSA+mBSFkK1QR jQ0+OzTm+QIgw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/18] btrfs: send: avoid unnecessary backref lookups when finding clone source Date: Tue, 1 Nov 2022 16:15:44 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At find_extent_clone(), unless we are given an inline extent, a file extent item that represents hole or an extent that starts beyond the i_size, we always do backref walking to look for clone sources, unless if we have more than SEND_MAX_EXTENT_REFS (64) known references on the extent. However if we know we only have one reference in the extent item and only one clone source (the send root), then it's pointless to do the backref walking to search for clone sources, as we can't clone from any other root. So skip the backref walking in that case. The following test was run on a non-debug kernel (Debian's default kernel config): $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create an extent tree that's not too small and none of the # extents is shared. for ((i = 1; i <= 50000; i++)); do xfs_io -f -c "pwrite 0 4K" $MNT/file_$i > /dev/null echo -ne "\r$i files created..." done echo btrfs subvolume snapshot -r $MNT $MNT/snap start=$(date +%s%N) btrfs send $MNT/snap > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo -e "\nsend took $dur milliseconds" umount $MNT Before this change: send took 5389 milliseconds After this change: send took 4519 milliseconds (-16.1%) Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 61496d3e1355..35c12fc7a864 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1354,6 +1354,7 @@ static int find_extent_clone(struct send_ctx *sctx, u64 disk_byte; u64 num_bytes; u64 extent_item_pos; + u64 extent_refs; u64 flags = 0; struct btrfs_file_extent_item *fi; struct extent_buffer *eb = path->nodes[0]; @@ -1408,14 +1409,22 @@ static int find_extent_clone(struct send_ctx *sctx, ei = btrfs_item_ptr(tmp_path->nodes[0], tmp_path->slots[0], struct btrfs_extent_item); + extent_refs = btrfs_extent_refs(tmp_path->nodes[0], ei); /* * Backreference walking (iterate_extent_inodes() below) is currently * too expensive when an extent has a large number of references, both * in time spent and used memory. So for now just fallback to write * operations instead of clone operations when an extent has more than * a certain amount of references. + * + * Also, if we have only one reference and only the send root as a clone + * source - meaning no clone roots were given in the struct + * btrfs_ioctl_send_args passed to the send ioctl - then it's our + * reference and there's no point in doing backref walking which is + * expensive, so exit early. */ - if (btrfs_extent_refs(tmp_path->nodes[0], ei) > SEND_MAX_EXTENT_REFS) { + if ((extent_refs == 1 && sctx->clone_roots_cnt == 1) || + extent_refs > SEND_MAX_EXTENT_REFS) { ret = -ENOENT; goto out; } From patchwork Tue Nov 1 16:15:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027164 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82999C433FE for ; Tue, 1 Nov 2022 16:16:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230405AbiKAQQL (ORCPT ); Tue, 1 Nov 2022 12:16:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230386AbiKAQQI (ORCPT ); Tue, 1 Nov 2022 12:16:08 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE9A1C903 for ; Tue, 1 Nov 2022 09:16:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8B95F61668 for ; Tue, 1 Nov 2022 16:16:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C85CC433B5 for ; Tue, 1 Nov 2022 16:16:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319366; bh=PM5W93Tt6FMQWHy8RTa1h9ZPc0t9lcX/43ddjfFAoZI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ROW/4J2MD9pqA4tLKlKRjgHHDmk8tql4bD0zJ2kTQ4CoHY3SudvJJ1BujOfnoAWP9 IBTM2FwteLvk8RtIyucFHnvdSaiXOo2d2Ku6od9poGPgsOHWIQq3AeR4jPiCKXKsA2 P6ilMiCmJ/ZxRnRhm/M4Ypn/M1pWgx0GzeLvIV5ixKk4zlbdg2zJG85N/0dtytDCFX 9CCYBrH8FspHUhfu8DgY/uT/mTeu5XIfh6UmmgHUfEJnLYPKuDuvIeM/cVSiSeADtG vt2VZgdwbHpBa3iIJK7urWxPSPVbWI+vuK3wd4Hy4djWoe4owotHnl+Yk5gOK7UUxL pSMNK/3UOZc2w== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/18] btrfs: send: optimize clone detection to increase extent sharing Date: Tue, 1 Nov 2022 16:15:45 +0000 Message-Id: <47473c33c761413af35563e02150acc9281fdcaf.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently send does not do the best decisions when it comes to decide between multiple clone sources, which results in clone operations for partial extent ranges, which has the following disadvantages: 1) We get less shared extents at the destination; 2) We have to read more data during the send operation and emit more write commands. Besides not being optimal behaviour, it also breaks user expectations and is often reported by users, with a recent example in the Link tag at the bottom of this change log. Part of the reason for this non-optimal behaviour is that the backref walking code does not provide information about the length of the file extent items that were found for each backref, so send is blind about which backref is the best to chose as a cloning source. The other existing reasons are just silliness, namely always prefering the inode with the lowest number when multiple are found for the same root and when we can clone from multiple roots, always prefer the send root over any of the other clone roots. This does not make any sense since any inode or root is fine and as good as any other inode/root. Fix this by making backref walking pass information about the number of bytes referenced by each file extent item and then have send's backref callback pick the inode with the highest number of bytes for each root. Finally select the root from which we can clone more bytes from. Example reproducer: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite -S 0xab -b 2M 0 2M" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar cp --reflink=always $MNT/foo $MNT/baz sync # Overwrite the second half of file foo. xfs_io -c "pwrite -S 0xcd -b 1M 1M 1M" $MNT/foo sync echo echo "*** fiemap in the original filesystem ***" echo xfs_io -c "fiemap -v" $MNT/foo xfs_io -c "fiemap -v" $MNT/bar xfs_io -c "fiemap -v" $MNT/baz echo btrfs filesystem du $MNT btrfs subvolume snapshot -r $MNT $MNT/snap btrfs send -f /tmp/send_stream $MNT/snap umount $MNT mkfs.btrfs -f $DEV &> /dev/null mount $DEV $MNT btrfs receive -f /tmp/send_stream $MNT echo echo "*** fiemap in the new filesystem ***" echo xfs_io -r -c "fiemap -v" $MNT/snap/foo xfs_io -r -c "fiemap -v" $MNT/snap/bar xfs_io -r -c "fiemap -v" $MNT/snap/baz echo btrfs filesystem du $MNT rm -f /tmp/send_stream rm -f /tmp/snap.fssum umount $MNT Before this change: $ ./test.sh (...) *** fiemap in the original filesystem *** /mnt/sdi/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 30720..32767 2048 0x1 /mnt/sdi/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 /mnt/sdi/baz: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 Total Exclusive Set shared Filename 2.00MiB 1.00MiB - /mnt/sdi/foo 2.00MiB 0.00B - /mnt/sdi/bar 2.00MiB 0.00B - /mnt/sdi/baz 6.00MiB 1.00MiB 2.00MiB /mnt/sdi Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap' At subvol /mnt/sdi/snap At subvol snap *** fiemap in the new filesystem *** /mnt/sdi/snap/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 30720..32767 2048 0x1 /mnt/sdi/snap/baz: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 32768..34815 2048 0x1 Total Exclusive Set shared Filename 2.00MiB 0.00B - /mnt/sdi/snap/foo 2.00MiB 1.00MiB - /mnt/sdi/snap/bar 2.00MiB 1.00MiB - /mnt/sdi/snap/baz 6.00MiB 2.00MiB - /mnt/sdi/snap 6.00MiB 2.00MiB 2.00MiB /mnt/sdi We end up with two 1M extents that are not shared for files bar and baz. After this change: $ ./test.sh (...) *** fiemap in the original filesystem *** /mnt/sdi/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 30720..32767 2048 0x1 /mnt/sdi/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 /mnt/sdi/baz: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 Total Exclusive Set shared Filename 2.00MiB 1.00MiB - /mnt/sdi/foo 2.00MiB 0.00B - /mnt/sdi/bar 2.00MiB 0.00B - /mnt/sdi/baz 6.00MiB 1.00MiB 2.00MiB /mnt/sdi Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap' At subvol /mnt/sdi/snap At subvol snap *** fiemap in the new filesystem *** /mnt/sdi/snap/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 26624..30719 4096 0x2001 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 30720..32767 2048 0x2001 /mnt/sdi/snap/baz: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2047]: 26624..28671 2048 0x2000 1: [2048..4095]: 30720..32767 2048 0x2001 Total Exclusive Set shared Filename 2.00MiB 0.00B - /mnt/sdi/snap/foo 2.00MiB 0.00B - /mnt/sdi/snap/bar 2.00MiB 0.00B - /mnt/sdi/snap/baz 6.00MiB 0.00B - /mnt/sdi/snap 6.00MiB 0.00B 3.00MiB /mnt/sdi Now there's a much better sharing, files bar and baz share 1M of the extent of file foo and the second extent of files bar and baz is shared between themselves. This will later be turned into a test case for fstests. Link: https://lore.kernel.org/linux-btrfs/20221008005704.795b44b0@crass-HP-ZBook-15-G2/ Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 9 +++++---- fs/btrfs/backref.h | 4 ++-- fs/btrfs/scrub.c | 4 ++-- fs/btrfs/send.c | 49 +++++++++++++++++++++++++++++++--------------- 4 files changed, 42 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 04cb608e7cfb..9be11a342de5 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -27,6 +27,7 @@ struct extent_inode_elem { u64 inum; u64 offset; + u64 num_bytes; struct extent_inode_elem *next; }; @@ -37,6 +38,7 @@ static int check_extent_in_eb(const struct btrfs_key *key, struct extent_inode_elem **eie, bool ignore_offset) { + const u64 data_len = btrfs_file_extent_num_bytes(eb, fi); u64 offset = 0; struct extent_inode_elem *e; @@ -45,10 +47,8 @@ static int check_extent_in_eb(const struct btrfs_key *key, !btrfs_file_extent_encryption(eb, fi) && !btrfs_file_extent_other_encoding(eb, fi)) { u64 data_offset; - u64 data_len; data_offset = btrfs_file_extent_offset(eb, fi); - data_len = btrfs_file_extent_num_bytes(eb, fi); if (extent_item_pos < data_offset || extent_item_pos >= data_offset + data_len) @@ -63,6 +63,7 @@ static int check_extent_in_eb(const struct btrfs_key *key, e->next = *eie; e->inum = key->objectid; e->offset = key->offset + offset; + e->num_bytes = data_len; *eie = e; return 0; @@ -2265,7 +2266,7 @@ static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, "ref for %llu resolved, key (%llu EXTEND_DATA %llu), root %llu", extent_item_objectid, eie->inum, eie->offset, root); - ret = iterate(eie->inum, eie->offset, root, ctx); + ret = iterate(eie->inum, eie->offset, eie->num_bytes, root, ctx); if (ret) { btrfs_debug(fs_info, "stopping iteration for %llu due to ret=%d", @@ -2357,7 +2358,7 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, return ret; } -static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) +static int build_ino_list(u64 inum, u64 offset, u64 num_bytes, u64 root, void *ctx) { struct btrfs_data_container *inodes = ctx; const size_t c = 3 * sizeof(u64); diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 2dd68f87dca5..8d3598155f3b 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -75,8 +75,8 @@ struct btrfs_backref_share_check_ctx { int prev_extents_cache_slot; }; -typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root, - void *ctx); +typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 num_bytes, + u64 root, void *ctx); struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void); void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2c7053d20c14..f9c58e39f7e1 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -809,8 +809,8 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( return ERR_PTR(-ENOMEM); } -static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, - void *warn_ctx) +static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, + u64 root, void *warn_ctx) { u32 nlink; int ret; diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 35c12fc7a864..4e175a8579ff 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -76,7 +76,7 @@ struct clone_root { struct btrfs_root *root; u64 ino; u64 offset; - + u64 num_bytes; u64 found_refs; }; @@ -1273,7 +1273,8 @@ static int __clone_root_cmp_sort(const void *e1, const void *e2) * Called for every backref that is found for the current extent. * Results are collected in sctx->clone_roots->ino/offset/found_refs */ -static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) +static int __iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root, + void *ctx_) { struct backref_ctx *bctx = ctx_; struct clone_root *found; @@ -1318,15 +1319,17 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) bctx->found++; found->found_refs++; - if (ino < found->ino) { + + /* + * If the given backref refers to a file extent item with a larger + * number of bytes than what we found before, use the new one so that + * we clone more optimally and end up doing less writes and getting + * less exclusive, non-shared extents at the destination. + */ + if (num_bytes > found->num_bytes) { found->ino = ino; found->offset = offset; - } else if (found->ino == ino) { - /* - * same extent found more then once in the same file. - */ - if (found->offset > offset + bctx->extent_len) - found->offset = offset; + found->num_bytes = num_bytes; } return 0; @@ -1437,6 +1440,7 @@ static int find_extent_clone(struct send_ctx *sctx, cur_clone_root = sctx->clone_roots + i; cur_clone_root->ino = (u64)-1; cur_clone_root->offset = 0; + cur_clone_root->num_bytes = 0; cur_clone_root->found_refs = 0; } @@ -1506,14 +1510,27 @@ static int find_extent_clone(struct send_ctx *sctx, cur_clone_root = NULL; for (i = 0; i < sctx->clone_roots_cnt; i++) { - if (sctx->clone_roots[i].found_refs) { - if (!cur_clone_root) - cur_clone_root = sctx->clone_roots + i; - else if (sctx->clone_roots[i].root == sctx->send_root) - /* prefer clones from send_root over others */ - cur_clone_root = sctx->clone_roots + i; - } + struct clone_root *clone_root = &sctx->clone_roots[i]; + if (clone_root->found_refs == 0) + continue; + + /* + * Choose the root from which we can clone more bytes, to + * minimize write operations and therefore have more extent + * sharing at the destination (the same as in the source). + */ + if (!cur_clone_root || + clone_root->num_bytes > cur_clone_root->num_bytes) { + cur_clone_root = clone_root; + + /* + * We found an optimal clone candidate (any inode from + * any root is fine), so we're done. + */ + if (clone_root->num_bytes >= backref_ctx.extent_len) + break; + } } if (cur_clone_root) { From patchwork Tue Nov 1 16:15:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AE75C4332F for ; Tue, 1 Nov 2022 16:16:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230396AbiKAQQM (ORCPT ); Tue, 1 Nov 2022 12:16:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230392AbiKAQQJ (ORCPT ); Tue, 1 Nov 2022 12:16:09 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A046D1C914 for ; Tue, 1 Nov 2022 09:16:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 72E496118E for ; Tue, 1 Nov 2022 16:16:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 641A6C43470 for ; Tue, 1 Nov 2022 16:16:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319366; bh=xqa5zA8sPG37wW1NEo1dx9z1z0a0+tQeLEzmy+TfNeA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=mWde60vQAvpm4FQUL3V3+uyUsfHze6v0KbSXagpEnYjFqCZpvHBURGUTgyMzyoV0u uM41MEPXF6sfLXt7alN6i+KbcP6dWqRXDx/nAqcpwP2sBx5tCwf4onUYKqFLVWdzEs 7v15Ou+k7Qcek0hP3gTiBicb+d3Qf4sdNoK5jqQ4p78t3AQr2b7Lv5VY77FHGHa+Y4 +4/vfpdrWLJqwixUR7O7gWvG+hNTVjGZGvNCfX7MSzsTioeV4oN+IanfWxeKG2qXJE rnziUdRW5CO1zTAfvzfKcWRlDrh+L5tLX2MdDoChOeNUwd664Acf59psRVjd+OOOBX GJjox0nppN+RA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/18] btrfs: use a single argument for extent offset in backref walking functions Date: Tue, 1 Nov 2022 16:15:46 +0000 Message-Id: <3b49911721081d38fa62ff5a156d037ebd733a5e.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The interface for find_parent_nodes() has two extent offset related arguments: 1) One u64 pointer argument for the extent offset; 2) One boolean argument to tell if the extent offset should be ignored or not. These are confusing, becase the extent offset pointer can be NULL and in some cases callers pass a NULL value as a way to tell the backref walking code to ignore offsets in file extent items (and simply consider all file extent items that point to the target data extent). The boolean argument was added in commit c995ab3cda3f ("btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents"), but it was never really necessary, it was enough if it could find a way to get a NULL value passed to the "extent_item_pos" argument of find_parent_nodes(). The arguments are also passed to functions called by find_parent_nodes() and respective helper functions, which further makes everything more complicated than needed. Then we have several backref walking related functions that end up calling find_parent_nodes(), either directly or through some other function that they call, and for many we have to use an "extent_item_pos" (u64) argument and a boolean "ignore_offset" argument too. This is confusing and not really necessary. So use a single argument to specify the extent offset, as a simple u64 and not as a pointer, but using a special value of (u64)-1, defined as a documented constant, to indicate when the extent offset should be ignored. This is also preparation work for the upcoming patches in the series that add other arguments to find_parent_nodes() and other related functions that use it. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 87 +++++++++++++++++++++---------------------- fs/btrfs/backref.h | 12 ++++-- fs/btrfs/relocation.c | 2 +- fs/btrfs/scrub.c | 2 +- fs/btrfs/send.c | 2 +- 5 files changed, 54 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 9be11a342de5..432064ee788e 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -35,15 +35,13 @@ static int check_extent_in_eb(const struct btrfs_key *key, const struct extent_buffer *eb, const struct btrfs_file_extent_item *fi, u64 extent_item_pos, - struct extent_inode_elem **eie, - bool ignore_offset) + struct extent_inode_elem **eie) { const u64 data_len = btrfs_file_extent_num_bytes(eb, fi); u64 offset = 0; struct extent_inode_elem *e; - if (!ignore_offset && - !btrfs_file_extent_compression(eb, fi) && + if (!btrfs_file_extent_compression(eb, fi) && !btrfs_file_extent_encryption(eb, fi) && !btrfs_file_extent_other_encoding(eb, fi)) { u64 data_offset; @@ -81,8 +79,7 @@ static void free_inode_elem_list(struct extent_inode_elem *eie) static int find_extent_in_eb(const struct extent_buffer *eb, u64 wanted_disk_byte, u64 extent_item_pos, - struct extent_inode_elem **eie, - bool ignore_offset) + struct extent_inode_elem **eie) { u64 disk_byte; struct btrfs_key key; @@ -111,7 +108,7 @@ static int find_extent_in_eb(const struct extent_buffer *eb, if (disk_byte != wanted_disk_byte) continue; - ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie, ignore_offset); + ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie); if (ret < 0) return ret; } @@ -450,9 +447,9 @@ static int is_shared_data_backref(struct preftrees *preftrees, u64 bytenr) static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, struct ulist *parents, struct preftrees *preftrees, struct prelim_ref *ref, - int level, u64 time_seq, const u64 *extent_item_pos, - bool ignore_offset) + int level, u64 time_seq, u64 extent_item_pos) { + const bool ignore_offset = (extent_item_pos == BTRFS_IGNORE_EXTENT_OFFSET); int ret = 0; int slot; struct extent_buffer *eb; @@ -528,10 +525,9 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, count++; else goto next; - if (extent_item_pos) { + if (!ignore_offset) { ret = check_extent_in_eb(&key, eb, fi, - *extent_item_pos, - &eie, ignore_offset); + extent_item_pos, &eie); if (ret < 0) break; } @@ -541,7 +537,7 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, eie, (void **)&old, GFP_NOFS); if (ret < 0) break; - if (!ret && extent_item_pos) { + if (!ret && !ignore_offset) { while (old->next) old = old->next; old->next = eie; @@ -570,7 +566,7 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct preftrees *preftrees, struct prelim_ref *ref, struct ulist *parents, - const u64 *extent_item_pos, bool ignore_offset) + u64 extent_item_pos) { struct btrfs_root *root; struct extent_buffer *eb; @@ -664,7 +660,7 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, } ret = add_all_parents(root, path, parents, preftrees, ref, level, - time_seq, extent_item_pos, ignore_offset); + time_seq, extent_item_pos); out: btrfs_put_root(root); out_free: @@ -712,8 +708,8 @@ static void free_leaf_list(struct ulist *ulist) static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 time_seq, struct preftrees *preftrees, - const u64 *extent_item_pos, - struct share_check *sc, bool ignore_offset) + u64 extent_item_pos, + struct share_check *sc) { int err; int ret = 0; @@ -756,8 +752,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, goto out; } err = resolve_indirect_ref(fs_info, path, time_seq, preftrees, - ref, parents, extent_item_pos, - ignore_offset); + ref, parents, extent_item_pos); /* * we can only tolerate ENOENT,otherwise,we should catch error * and return directly. @@ -1340,18 +1335,21 @@ static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx * * Otherwise this returns 0 for success and <0 for an error. * - * If ignore_offset is set to false, only extent refs whose offsets match - * extent_item_pos are returned. If true, every extent ref is returned - * and extent_item_pos is ignored. + * @extent_item_pos is meaningful only if we are dealing with a data extent. + * If its value is not BTRFS_IGNORE_EXTENT_OFFSET, then only collect references + * from file extent items that refer to a section of the data extent that + * contains @extent_item_pos. If its value is BTRFS_IGNORE_EXTENT_OFFSET then + * collect references for every file extent item that points to the data extent. * * FIXME some caching might speed things up */ static int find_parent_nodes(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist *refs, - struct ulist *roots, const u64 *extent_item_pos, - struct share_check *sc, bool ignore_offset) + struct ulist *roots, u64 extent_item_pos, + struct share_check *sc) { + const bool ignore_offset = (extent_item_pos == BTRFS_IGNORE_EXTENT_OFFSET); struct btrfs_root *root = btrfs_extent_root(fs_info, bytenr); struct btrfs_key key; struct btrfs_path *path; @@ -1539,7 +1537,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, WARN_ON(!RB_EMPTY_ROOT(&preftrees.indirect_missing_keys.root.rb_root)); ret = resolve_indirect_refs(fs_info, path, time_seq, &preftrees, - extent_item_pos, sc, ignore_offset); + extent_item_pos, sc); if (ret) goto out; @@ -1573,8 +1571,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, goto out; } if (ref->count && ref->parent) { - if (extent_item_pos && !ref->inode_list && - ref->level == 0) { + if (!ignore_offset && !ref->inode_list && ref->level == 0) { struct extent_buffer *eb; eb = read_tree_block(fs_info, ref->parent, 0, @@ -1592,7 +1589,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, if (!path->skip_locking) btrfs_tree_read_lock(eb); ret = find_extent_in_eb(eb, bytenr, - *extent_item_pos, &eie, ignore_offset); + extent_item_pos, &eie); if (!path->skip_locking) btrfs_tree_read_unlock(eb); free_extent_buffer(eb); @@ -1611,7 +1608,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, (void **)&eie, GFP_NOFS); if (ret < 0) goto out; - if (!ret && extent_item_pos) { + if (!ret && !ignore_offset) { /* * We've recorded that parent, so we must extend * its inode list here. @@ -1665,7 +1662,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist **leafs, - const u64 *extent_item_pos, bool ignore_offset) + u64 extent_item_pos) { int ret; @@ -1674,7 +1671,7 @@ int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, return -ENOMEM; ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - *leafs, NULL, extent_item_pos, NULL, ignore_offset); + *leafs, NULL, extent_item_pos, NULL); if (ret < 0 && ret != -ENOENT) { free_leaf_list(*leafs); return ret; @@ -1698,8 +1695,7 @@ int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, */ static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots, - bool ignore_offset) + u64 time_seq, struct ulist **roots) { struct ulist *tmp; struct ulist_node *node = NULL; @@ -1718,7 +1714,8 @@ static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, ULIST_ITER_INIT(&uiter); while (1) { ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - tmp, *roots, NULL, NULL, ignore_offset); + tmp, *roots, BTRFS_IGNORE_EXTENT_OFFSET, + NULL); if (ret < 0 && ret != -ENOENT) { ulist_free(tmp); ulist_free(*roots); @@ -1745,8 +1742,7 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans, if (!trans && !skip_commit_root_sem) down_read(&fs_info->commit_root_sem); - ret = btrfs_find_all_roots_safe(trans, fs_info, bytenr, - time_seq, roots, false); + ret = btrfs_find_all_roots_safe(trans, fs_info, bytenr, time_seq, roots); if (!trans && !skip_commit_root_sem) up_read(&fs_info->commit_root_sem); return ret; @@ -1845,7 +1841,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, bool cached; ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, &ctx->refs, - NULL, NULL, &shared, false); + NULL, BTRFS_IGNORE_EXTENT_OFFSET, &shared); if (ret == BACKREF_FOUND_SHARED || ret == BACKREF_FOUND_NOT_SHARED) { /* If shared must return 1, otherwise return 0. */ @@ -2286,8 +2282,7 @@ static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_item_pos, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx, - bool ignore_offset) + iterate_extent_inodes_t *iterate, void *ctx) { int ret; struct btrfs_trans_handle *trans = NULL; @@ -2318,16 +2313,14 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, down_read(&fs_info->commit_root_sem); ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, - seq_elem.seq, &refs, - &extent_item_pos, ignore_offset); + seq_elem.seq, &refs, extent_item_pos); if (ret) goto out; ULIST_ITER_INIT(&ref_uiter); while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { ret = btrfs_find_all_roots_safe(trans, fs_info, ref_node->val, - seq_elem.seq, &roots, - ignore_offset); + seq_elem.seq, &roots); if (ret) break; ULIST_ITER_INIT(&root_uiter); @@ -2395,10 +2388,14 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) return -EINVAL; - extent_item_pos = logical - found_key.objectid; + if (ignore_offset) + extent_item_pos = BTRFS_IGNORE_EXTENT_OFFSET; + else + extent_item_pos = logical - found_key.objectid; + ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, search_commit_root, - build_ino_list, ctx, ignore_offset); + build_ino_list, ctx); return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 8d3598155f3b..2eb99f23cc8f 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -12,6 +12,13 @@ #include "disk-io.h" #include "extent_io.h" +/* + * Pass to backref walking functions to tell them to include references from + * all file extent items that point to the target data extent, regardless if + * they refer to the whole extent or just sections of it (bookend extents). + */ +#define BTRFS_IGNORE_EXTENT_OFFSET ((u64)-1) + struct inode_fs_paths { struct btrfs_path *btrfs_path; struct btrfs_root *fs_root; @@ -92,8 +99,7 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, int iterate_extent_inodes(struct btrfs_fs_info *fs_info, u64 extent_item_objectid, u64 extent_offset, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx, - bool ignore_offset); + iterate_extent_inodes_t *iterate, void *ctx); int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, void *ctx, @@ -104,7 +110,7 @@ int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist **leafs, - const u64 *extent_item_pos, bool ignore_offset); + u64 extent_item_pos); int btrfs_find_all_roots(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytenr, u64 time_seq, struct ulist **roots, diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index d119986d1599..45690f7b5900 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3417,7 +3417,7 @@ int add_data_references(struct reloc_control *rc, btrfs_release_path(path); ret = btrfs_find_all_leafs(NULL, fs_info, extent_key->objectid, - 0, &leaves, NULL, true); + 0, &leaves, BTRFS_IGNORE_EXTENT_OFFSET); if (ret < 0) return ret; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f9c58e39f7e1..dc178f59f34a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -969,7 +969,7 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) swarn.dev = dev; iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, - scrub_print_warning_inode, &swarn, false); + scrub_print_warning_inode, &swarn); } out: diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 4e175a8579ff..b6d9200af2e3 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1467,7 +1467,7 @@ static int find_extent_clone(struct send_ctx *sctx, extent_item_pos = 0; ret = iterate_extent_inodes(fs_info, found_key.objectid, extent_item_pos, 1, __iterate_backrefs, - &backref_ctx, false); + &backref_ctx); if (ret < 0) goto out; From patchwork Tue Nov 1 16:15:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027170 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FC53C4332F for ; Tue, 1 Nov 2022 16:16:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230443AbiKAQQQ (ORCPT ); Tue, 1 Nov 2022 12:16:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230413AbiKAQQN (ORCPT ); Tue, 1 Nov 2022 12:16:13 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D048B1C916 for ; Tue, 1 Nov 2022 09:16:10 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 40F9BB81D9F for ; Tue, 1 Nov 2022 16:16:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4A91DC433B5 for ; Tue, 1 Nov 2022 16:16:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319367; bh=gwnepN1AGl2YxeYhEAprjo9N3Z1TrFVA9NWB4LiH/RI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=KlFzUtkKTVdhgIsT9TQ1yDmFFe7x1W0xW+/PxYLh/pGAYtEB/e0xMWofsOJIPoccb BE5bEjKR3ccN2frKi6RF+KM9rsOJ7Mv67kH84UE0tmBsLJctgJUFML+QfngBOexDPD A3j5WuFCB5jgLXQ/WkCURbPEWzFDHu+QiHeIwcB+GQhV/xtMmtcQeM7PnyfGDQuHKX 7I67Tokuy98KaQ/fUVBnAS58tQtcuPluJrgfVp8tV0Fc05Aii0fi+xr3mBCEh8MSIk nRQbRUyf3Nb56cEplQwOmldqE+1JOhKY3nMb7PHOACzM4nr9T7TGr8QT5B1eqq59FM mvYuL+3PNdYLA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 11/18] btrfs: use a structure to pass arguments to backref walking functions Date: Tue, 1 Nov 2022 16:15:47 +0000 Message-Id: <4676b4de2371f8aae5a195f2cd3625faf4085271.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The public backref walking functions have quite a lot of arguments that are passed down the call stack to find_parent_nodes(), the core function of the backref walking code. The next patches in series will need to add even arguments to these functions that should be passed not only to find_parent_nodes(), but also to other functions used by the later (directly or even lower in the call stack). So create a structure to hold all these arguments and state used by the main backref walking function, find_parent_nodes(), and use it as the argument for the public backref walking functions iterate_extent_inodes(), btrfs_find_all_leafs() and btrfs_find_all_roots(). Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 343 +++++++++++++++++----------------- fs/btrfs/backref.h | 76 ++++++-- fs/btrfs/qgroup.c | 38 ++-- fs/btrfs/relocation.c | 19 +- fs/btrfs/scrub.c | 14 +- fs/btrfs/send.c | 15 +- fs/btrfs/tests/qgroup-tests.c | 50 ++++- 7 files changed, 328 insertions(+), 227 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 432064ee788e..a1a00a7bdba5 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -444,12 +444,12 @@ static int is_shared_data_backref(struct preftrees *preftrees, u64 bytenr) return 0; } -static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, +static int add_all_parents(struct btrfs_backref_walk_ctx *ctx, + struct btrfs_root *root, struct btrfs_path *path, struct ulist *parents, struct preftrees *preftrees, struct prelim_ref *ref, - int level, u64 time_seq, u64 extent_item_pos) + int level) { - const bool ignore_offset = (extent_item_pos == BTRFS_IGNORE_EXTENT_OFFSET); int ret = 0; int slot; struct extent_buffer *eb; @@ -484,10 +484,10 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, if (path->slots[0] >= btrfs_header_nritems(eb) || is_shared_data_backref(preftrees, eb->start) || ref->root_id != btrfs_header_owner(eb)) { - if (time_seq == BTRFS_SEQ_LAST) + if (ctx->time_seq == BTRFS_SEQ_LAST) ret = btrfs_next_leaf(root, path); else - ret = btrfs_next_old_leaf(root, path, time_seq); + ret = btrfs_next_old_leaf(root, path, ctx->time_seq); } while (!ret && count < ref->count) { @@ -508,10 +508,10 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, if (slot == 0 && (is_shared_data_backref(preftrees, eb->start) || ref->root_id != btrfs_header_owner(eb))) { - if (time_seq == BTRFS_SEQ_LAST) + if (ctx->time_seq == BTRFS_SEQ_LAST) ret = btrfs_next_leaf(root, path); else - ret = btrfs_next_old_leaf(root, path, time_seq); + ret = btrfs_next_old_leaf(root, path, ctx->time_seq); continue; } fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); @@ -525,9 +525,9 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, count++; else goto next; - if (!ignore_offset) { + if (!ctx->ignore_extent_item_pos) { ret = check_extent_in_eb(&key, eb, fi, - extent_item_pos, &eie); + ctx->extent_item_pos, &eie); if (ret < 0) break; } @@ -537,7 +537,7 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, eie, (void **)&old, GFP_NOFS); if (ret < 0) break; - if (!ret && !ignore_offset) { + if (!ret && !ctx->ignore_extent_item_pos) { while (old->next) old = old->next; old->next = eie; @@ -545,10 +545,10 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, eie = NULL; } next: - if (time_seq == BTRFS_SEQ_LAST) + if (ctx->time_seq == BTRFS_SEQ_LAST) ret = btrfs_next_item(root, path); else - ret = btrfs_next_old_item(root, path, time_seq); + ret = btrfs_next_old_item(root, path, ctx->time_seq); } if (ret > 0) @@ -562,11 +562,10 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, * resolve an indirect backref in the form (root_id, key, level) * to a logical address */ -static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, - struct btrfs_path *path, u64 time_seq, +static int resolve_indirect_ref(struct btrfs_backref_walk_ctx *ctx, + struct btrfs_path *path, struct preftrees *preftrees, - struct prelim_ref *ref, struct ulist *parents, - u64 extent_item_pos) + struct prelim_ref *ref, struct ulist *parents) { struct btrfs_root *root; struct extent_buffer *eb; @@ -584,9 +583,9 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, * here. */ if (path->search_commit_root) - root = btrfs_get_fs_root_commit_root(fs_info, path, ref->root_id); + root = btrfs_get_fs_root_commit_root(ctx->fs_info, path, ref->root_id); else - root = btrfs_get_fs_root(fs_info, ref->root_id, false); + root = btrfs_get_fs_root(ctx->fs_info, ref->root_id, false); if (IS_ERR(root)) { ret = PTR_ERR(root); goto out_free; @@ -598,17 +597,17 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, goto out; } - if (btrfs_is_testing(fs_info)) { + if (btrfs_is_testing(ctx->fs_info)) { ret = -ENOENT; goto out; } if (path->search_commit_root) root_level = btrfs_header_level(root->commit_root); - else if (time_seq == BTRFS_SEQ_LAST) + else if (ctx->time_seq == BTRFS_SEQ_LAST) root_level = btrfs_header_level(root->node); else - root_level = btrfs_old_root_level(root, time_seq); + root_level = btrfs_old_root_level(root, ctx->time_seq); if (root_level + 1 == level) goto out; @@ -636,12 +635,12 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, search_key.offset >= LLONG_MAX) search_key.offset = 0; path->lowest_level = level; - if (time_seq == BTRFS_SEQ_LAST) + if (ctx->time_seq == BTRFS_SEQ_LAST) ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0); else - ret = btrfs_search_old_slot(root, &search_key, path, time_seq); + ret = btrfs_search_old_slot(root, &search_key, path, ctx->time_seq); - btrfs_debug(fs_info, + btrfs_debug(ctx->fs_info, "search slot in root %llu (level %d, ref count %d) returned %d for key (%llu %u %llu)", ref->root_id, level, ref->count, ret, ref->key_for_search.objectid, ref->key_for_search.type, @@ -659,8 +658,7 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info, eb = path->nodes[level]; } - ret = add_all_parents(root, path, parents, preftrees, ref, level, - time_seq, extent_item_pos); + ret = add_all_parents(ctx, root, path, parents, preftrees, ref, level); out: btrfs_put_root(root); out_free: @@ -705,10 +703,9 @@ static void free_leaf_list(struct ulist *ulist) * rbtree as they are encountered. The new backrefs are subsequently * resolved as above. */ -static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, - struct btrfs_path *path, u64 time_seq, +static int resolve_indirect_refs(struct btrfs_backref_walk_ctx *ctx, + struct btrfs_path *path, struct preftrees *preftrees, - u64 extent_item_pos, struct share_check *sc) { int err; @@ -751,14 +748,13 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, ret = BACKREF_FOUND_SHARED; goto out; } - err = resolve_indirect_ref(fs_info, path, time_seq, preftrees, - ref, parents, extent_item_pos); + err = resolve_indirect_ref(ctx, path, preftrees, ref, parents); /* * we can only tolerate ENOENT,otherwise,we should catch error * and return directly. */ if (err == -ENOENT) { - prelim_ref_insert(fs_info, &preftrees->direct, ref, + prelim_ref_insert(ctx->fs_info, &preftrees->direct, ref, NULL); continue; } else if (err) { @@ -787,7 +783,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, memcpy(new_ref, ref, sizeof(*ref)); new_ref->parent = node->val; new_ref->inode_list = unode_aux_to_inode_list(node); - prelim_ref_insert(fs_info, &preftrees->direct, + prelim_ref_insert(ctx->fs_info, &preftrees->direct, new_ref, NULL); } @@ -795,7 +791,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, * Now it's a direct ref, put it in the direct tree. We must * do this last because the ref could be merged/freed here. */ - prelim_ref_insert(fs_info, &preftrees->direct, ref, NULL); + prelim_ref_insert(ctx->fs_info, &preftrees->direct, ref, NULL); ulist_reinit(parents); cond_resched(); @@ -1325,32 +1321,18 @@ static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx * indirect refs to their parent bytenr. * When roots are found, they're added to the roots list * - * If time_seq is set to BTRFS_SEQ_LAST, it will not search delayed_refs, and - * behave much like trans == NULL case, the difference only lies in it will not - * commit root. - * The special case is for qgroup to search roots in commit_transaction(). - * - * @sc - if !NULL, then immediately return BACKREF_FOUND_SHARED when a - * shared extent is detected. + * @ctx: Backref walking context object, must be not NULL. + * @sc: If !NULL, then immediately return BACKREF_FOUND_SHARED when a + * shared extent is detected. * * Otherwise this returns 0 for success and <0 for an error. * - * @extent_item_pos is meaningful only if we are dealing with a data extent. - * If its value is not BTRFS_IGNORE_EXTENT_OFFSET, then only collect references - * from file extent items that refer to a section of the data extent that - * contains @extent_item_pos. If its value is BTRFS_IGNORE_EXTENT_OFFSET then - * collect references for every file extent item that points to the data extent. - * * FIXME some caching might speed things up */ -static int find_parent_nodes(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist *refs, - struct ulist *roots, u64 extent_item_pos, +static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx, struct share_check *sc) { - const bool ignore_offset = (extent_item_pos == BTRFS_IGNORE_EXTENT_OFFSET); - struct btrfs_root *root = btrfs_extent_root(fs_info, bytenr); + struct btrfs_root *root = btrfs_extent_root(ctx->fs_info, ctx->bytenr); struct btrfs_key key; struct btrfs_path *path; struct btrfs_delayed_ref_root *delayed_refs = NULL; @@ -1368,11 +1350,11 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, /* Roots ulist is not needed when using a sharedness check context. */ if (sc) - ASSERT(roots == NULL); + ASSERT(ctx->roots == NULL); - key.objectid = bytenr; + key.objectid = ctx->bytenr; key.offset = (u64)-1; - if (btrfs_fs_incompat(fs_info, SKINNY_METADATA)) + if (btrfs_fs_incompat(ctx->fs_info, SKINNY_METADATA)) key.type = BTRFS_METADATA_ITEM_KEY; else key.type = BTRFS_EXTENT_ITEM_KEY; @@ -1380,12 +1362,12 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, path = btrfs_alloc_path(); if (!path) return -ENOMEM; - if (!trans) { + if (!ctx->trans) { path->search_commit_root = 1; path->skip_locking = 1; } - if (time_seq == BTRFS_SEQ_LAST) + if (ctx->time_seq == BTRFS_SEQ_LAST) path->skip_locking = 1; again: @@ -1401,17 +1383,17 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, goto out; } - if (trans && likely(trans->type != __TRANS_DUMMY) && - time_seq != BTRFS_SEQ_LAST) { + if (ctx->trans && likely(ctx->trans->type != __TRANS_DUMMY) && + ctx->time_seq != BTRFS_SEQ_LAST) { /* * We have a specific time_seq we care about and trans which * means we have the path lock, we need to grab the ref head and * lock it so we have a consistent view of the refs at the given * time. */ - delayed_refs = &trans->transaction->delayed_refs; + delayed_refs = &ctx->trans->transaction->delayed_refs; spin_lock(&delayed_refs->lock); - head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); + head = btrfs_find_delayed_ref_head(delayed_refs, ctx->bytenr); if (head) { if (!mutex_trylock(&head->mutex)) { refcount_inc(&head->refs); @@ -1429,7 +1411,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, goto again; } spin_unlock(&delayed_refs->lock); - ret = add_delayed_refs(fs_info, head, time_seq, + ret = add_delayed_refs(ctx->fs_info, head, ctx->time_seq, &preftrees, sc); mutex_unlock(&head->mutex); if (ret) @@ -1447,14 +1429,14 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, leaf = path->nodes[0]; slot = path->slots[0]; btrfs_item_key_to_cpu(leaf, &key, slot); - if (key.objectid == bytenr && + if (key.objectid == ctx->bytenr && (key.type == BTRFS_EXTENT_ITEM_KEY || key.type == BTRFS_METADATA_ITEM_KEY)) { - ret = add_inline_refs(fs_info, path, bytenr, + ret = add_inline_refs(ctx->fs_info, path, ctx->bytenr, &info_level, &preftrees, sc); if (ret) goto out; - ret = add_keyed_refs(root, path, bytenr, info_level, + ret = add_keyed_refs(root, path, ctx->bytenr, info_level, &preftrees, sc); if (ret) goto out; @@ -1483,7 +1465,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * extent item pointing to the data extent) is shared, that is, if any * of the extent buffers in the path is referenced by other trees. */ - if (sc && bytenr == sc->data_bytenr) { + if (sc && ctx->bytenr == sc->data_bytenr) { /* * If our data extent is from a generation more recent than the * last generation used to snapshot the root, then we know that @@ -1530,14 +1512,13 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, btrfs_release_path(path); - ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); + ret = add_missing_keys(ctx->fs_info, &preftrees, path->skip_locking == 0); if (ret) goto out; WARN_ON(!RB_EMPTY_ROOT(&preftrees.indirect_missing_keys.root.rb_root)); - ret = resolve_indirect_refs(fs_info, path, time_seq, &preftrees, - extent_item_pos, sc); + ret = resolve_indirect_refs(ctx, path, &preftrees, sc); if (ret) goto out; @@ -1564,17 +1545,18 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * e.g. different offsets would not be merged, * and would retain their original ref->count < 0. */ - if (roots && ref->count && ref->root_id && ref->parent == 0) { + if (ctx->roots && ref->count && ref->root_id && ref->parent == 0) { /* no parent == root of tree */ - ret = ulist_add(roots, ref->root_id, 0, GFP_NOFS); + ret = ulist_add(ctx->roots, ref->root_id, 0, GFP_NOFS); if (ret < 0) goto out; } if (ref->count && ref->parent) { - if (!ignore_offset && !ref->inode_list && ref->level == 0) { + if (!ctx->ignore_extent_item_pos && !ref->inode_list && + ref->level == 0) { struct extent_buffer *eb; - eb = read_tree_block(fs_info, ref->parent, 0, + eb = read_tree_block(ctx->fs_info, ref->parent, 0, 0, ref->level, NULL); if (IS_ERR(eb)) { ret = PTR_ERR(eb); @@ -1588,8 +1570,8 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, if (!path->skip_locking) btrfs_tree_read_lock(eb); - ret = find_extent_in_eb(eb, bytenr, - extent_item_pos, &eie); + ret = find_extent_in_eb(eb, ctx->bytenr, + ctx->extent_item_pos, &eie); if (!path->skip_locking) btrfs_tree_read_unlock(eb); free_extent_buffer(eb); @@ -1603,12 +1585,12 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, */ eie = NULL; } - ret = ulist_add_merge_ptr(refs, ref->parent, + ret = ulist_add_merge_ptr(ctx->refs, ref->parent, ref->inode_list, (void **)&eie, GFP_NOFS); if (ret < 0) goto out; - if (!ret && !ignore_offset) { + if (!ret && !ctx->ignore_extent_item_pos) { /* * We've recorded that parent, so we must extend * its inode list here. @@ -1652,28 +1634,29 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, } /* - * Finds all leafs with a reference to the specified combination of bytenr and - * offset. key_list_head will point to a list of corresponding keys (caller must - * free each list element). The leafs will be stored in the leafs ulist, which - * must be freed with ulist_free. + * Finds all leaves with a reference to the specified combination of + * @ctx->bytenr and @ctx->extent_item_pos. The bytenr of the found leaves are + * added to the ulist at @ctx->refs, and that ulist is allocated by this + * function. The caller should free the ulist with free_leaf_list() if + * @ctx->ignore_extent_item_pos is false, otherwise a fimple ulist_free() is + * enough. * - * returns 0 on success, <0 on error + * Returns 0 on success and < 0 on error. On error @ctx->refs is not allocated. */ -int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **leafs, - u64 extent_item_pos) +int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx) { int ret; - *leafs = ulist_alloc(GFP_NOFS); - if (!*leafs) + ASSERT(ctx->refs == NULL); + + ctx->refs = ulist_alloc(GFP_NOFS); + if (!ctx->refs) return -ENOMEM; - ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - *leafs, NULL, extent_item_pos, NULL); + ret = find_parent_nodes(ctx, NULL); if (ret < 0 && ret != -ENOENT) { - free_leaf_list(*leafs); + free_leaf_list(ctx->refs); + ctx->refs = NULL; return ret; } @@ -1681,7 +1664,7 @@ int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, } /* - * walk all backrefs for a given extent to find all roots that reference this + * Walk all backrefs for a given extent to find all roots that reference this * extent. Walking a backref means finding all extents that reference this * extent and in turn walk the backrefs of those, too. Naturally this is a * recursive process, but here it is implemented in an iterative fashion: We @@ -1689,62 +1672,74 @@ int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, * list. In turn, we find all referencing extents for those, further appending * to the list. The way we iterate the list allows adding more elements after * the current while iterating. The process stops when we reach the end of the - * list. Found roots are added to the roots list. + * list. + * + * Found roots are added to @ctx->roots, which is allocated by this function and + * @ctx->roots should be NULL when calling this function. This function also + * requires @ctx->refs to be NULL, as it uses it for allocating a ulist to do + * temporary work, and frees it before returning. * - * returns 0 on success, < 0 on error. + * Returns 0 on success, < 0 on error. On error @ctx->roots is always NULL. */ -static int btrfs_find_all_roots_safe(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots) +static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx) { - struct ulist *tmp; - struct ulist_node *node = NULL; + const u64 orig_bytenr = ctx->bytenr; + const bool orig_ignore_extent_item_pos = ctx->ignore_extent_item_pos; struct ulist_iterator uiter; - int ret; + int ret = 0; - tmp = ulist_alloc(GFP_NOFS); - if (!tmp) + ASSERT(ctx->refs == NULL); + ASSERT(ctx->roots == NULL); + + ctx->refs = ulist_alloc(GFP_NOFS); + if (!ctx->refs) return -ENOMEM; - *roots = ulist_alloc(GFP_NOFS); - if (!*roots) { - ulist_free(tmp); + + ctx->roots = ulist_alloc(GFP_NOFS); + if (!ctx->roots) { + ulist_free(ctx->refs); + ctx->refs = NULL; return -ENOMEM; } + ctx->ignore_extent_item_pos = true; + ULIST_ITER_INIT(&uiter); while (1) { - ret = find_parent_nodes(trans, fs_info, bytenr, time_seq, - tmp, *roots, BTRFS_IGNORE_EXTENT_OFFSET, - NULL); + struct ulist_node *node; + + ret = find_parent_nodes(ctx, NULL); if (ret < 0 && ret != -ENOENT) { - ulist_free(tmp); - ulist_free(*roots); - *roots = NULL; - return ret; + ulist_free(ctx->roots); + ctx->roots = NULL; + break; } - node = ulist_next(tmp, &uiter); + ret = 0; + node = ulist_next(ctx->refs, &uiter); if (!node) break; - bytenr = node->val; + ctx->bytenr = node->val; cond_resched(); } - ulist_free(tmp); - return 0; + ulist_free(ctx->refs); + ctx->refs = NULL; + ctx->bytenr = orig_bytenr; + ctx->ignore_extent_item_pos = orig_ignore_extent_item_pos; + + return ret; } -int btrfs_find_all_roots(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots, +int btrfs_find_all_roots(struct btrfs_backref_walk_ctx *ctx, bool skip_commit_root_sem) { int ret; - if (!trans && !skip_commit_root_sem) - down_read(&fs_info->commit_root_sem); - ret = btrfs_find_all_roots_safe(trans, fs_info, bytenr, time_seq, roots); - if (!trans && !skip_commit_root_sem) - up_read(&fs_info->commit_root_sem); + if (!ctx->trans && !skip_commit_root_sem) + down_read(&ctx->fs_info->commit_root_sem); + ret = btrfs_find_all_roots_safe(ctx); + if (!ctx->trans && !skip_commit_root_sem) + up_read(&ctx->fs_info->commit_root_sem); return ret; } @@ -1794,6 +1789,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, struct btrfs_backref_share_check_ctx *ctx) { + struct btrfs_backref_walk_ctx walk_ctx = { 0 }; struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_trans_handle *trans; @@ -1830,8 +1826,14 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, down_read(&fs_info->commit_root_sem); } else { btrfs_get_tree_mod_seq(fs_info, &elem); + walk_ctx.time_seq = elem.seq; } + walk_ctx.ignore_extent_item_pos = true; + walk_ctx.trans = trans; + walk_ctx.fs_info = fs_info; + walk_ctx.refs = &ctx->refs; + /* -1 means we are in the bytenr of the data extent. */ level = -1; ULIST_ITER_INIT(&uiter); @@ -1840,8 +1842,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, bool is_shared; bool cached; - ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, &ctx->refs, - NULL, BTRFS_IGNORE_EXTENT_OFFSET, &shared); + walk_ctx.bytenr = bytenr; + ret = find_parent_nodes(&walk_ctx, &shared); if (ret == BACKREF_FOUND_SHARED || ret == BACKREF_FOUND_NOT_SHARED) { /* If shared must return 1, otherwise return 0. */ @@ -2279,73 +2281,81 @@ static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, * the given parameters. * when the iterator function returns a non-zero value, iteration stops. */ -int iterate_extent_inodes(struct btrfs_fs_info *fs_info, - u64 extent_item_objectid, u64 extent_item_pos, - int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx) +int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, + bool search_commit_root, + iterate_extent_inodes_t *iterate, void *user_ctx) { int ret; - struct btrfs_trans_handle *trans = NULL; - struct ulist *refs = NULL; - struct ulist *roots = NULL; - struct ulist_node *ref_node = NULL; - struct ulist_node *root_node = NULL; + struct ulist *refs; + struct ulist_node *ref_node; struct btrfs_seq_list seq_elem = BTRFS_SEQ_LIST_INIT(seq_elem); struct ulist_iterator ref_uiter; - struct ulist_iterator root_uiter; - btrfs_debug(fs_info, "resolving all inodes for extent %llu", - extent_item_objectid); + btrfs_debug(ctx->fs_info, "resolving all inodes for extent %llu", + ctx->bytenr); + + ASSERT(ctx->trans == NULL); if (!search_commit_root) { - trans = btrfs_attach_transaction(fs_info->tree_root); + struct btrfs_trans_handle *trans; + + trans = btrfs_attach_transaction(ctx->fs_info->tree_root); if (IS_ERR(trans)) { if (PTR_ERR(trans) != -ENOENT && PTR_ERR(trans) != -EROFS) return PTR_ERR(trans); trans = NULL; } + ctx->trans = trans; } - if (trans) - btrfs_get_tree_mod_seq(fs_info, &seq_elem); - else - down_read(&fs_info->commit_root_sem); + if (ctx->trans) { + btrfs_get_tree_mod_seq(ctx->fs_info, &seq_elem); + ctx->time_seq = seq_elem.seq; + } else { + down_read(&ctx->fs_info->commit_root_sem); + } - ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, - seq_elem.seq, &refs, extent_item_pos); + ret = btrfs_find_all_leafs(ctx); if (ret) goto out; + refs = ctx->refs; + ctx->refs = NULL; ULIST_ITER_INIT(&ref_uiter); while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { - ret = btrfs_find_all_roots_safe(trans, fs_info, ref_node->val, - seq_elem.seq, &roots); + struct ulist_node *root_node; + struct ulist_iterator root_uiter; + + ctx->bytenr = ref_node->val; + ret = btrfs_find_all_roots_safe(ctx); if (ret) break; + ULIST_ITER_INIT(&root_uiter); - while (!ret && (root_node = ulist_next(roots, &root_uiter))) { - btrfs_debug(fs_info, + while (!ret && (root_node = ulist_next(ctx->roots, &root_uiter))) { + btrfs_debug(ctx->fs_info, "root %llu references leaf %llu, data list %#llx", root_node->val, ref_node->val, ref_node->aux); - ret = iterate_leaf_refs(fs_info, + ret = iterate_leaf_refs(ctx->fs_info, (struct extent_inode_elem *) (uintptr_t)ref_node->aux, - root_node->val, - extent_item_objectid, - iterate, ctx); + root_node->val, ctx->bytenr, + iterate, user_ctx); } - ulist_free(roots); + ulist_free(ctx->roots); + ctx->roots = NULL; } free_leaf_list(refs); out: - if (trans) { - btrfs_put_tree_mod_seq(fs_info, &seq_elem); - btrfs_end_transaction(trans); + if (ctx->trans) { + btrfs_put_tree_mod_seq(ctx->fs_info, &seq_elem); + btrfs_end_transaction(ctx->trans); + ctx->trans = NULL; } else { - up_read(&fs_info->commit_root_sem); + up_read(&ctx->fs_info->commit_root_sem); } return ret; @@ -2375,8 +2385,8 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, void *ctx, bool ignore_offset) { + struct btrfs_backref_walk_ctx walk_ctx = { 0 }; int ret; - u64 extent_item_pos; u64 flags = 0; struct btrfs_key found_key; int search_commit_root = path->search_commit_root; @@ -2388,16 +2398,15 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) return -EINVAL; + walk_ctx.bytenr = found_key.objectid; if (ignore_offset) - extent_item_pos = BTRFS_IGNORE_EXTENT_OFFSET; + walk_ctx.ignore_extent_item_pos = true; else - extent_item_pos = logical - found_key.objectid; + walk_ctx.extent_item_pos = logical - found_key.objectid; + walk_ctx.fs_info = fs_info; - ret = iterate_extent_inodes(fs_info, found_key.objectid, - extent_item_pos, search_commit_root, - build_ino_list, ctx); - - return ret; + return iterate_extent_inodes(&walk_ctx, search_commit_root, + build_ino_list, ctx); } static int inode_to_path(u64 inum, u32 name_len, unsigned long name_off, diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 2eb99f23cc8f..ce700dfcc016 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -13,11 +13,63 @@ #include "extent_io.h" /* - * Pass to backref walking functions to tell them to include references from - * all file extent items that point to the target data extent, regardless if - * they refer to the whole extent or just sections of it (bookend extents). + * Context and arguments for backref walking functions. Some of the fields are + * to be filled by the caller of such functions while other are filled by the + * functions themselves, as described below. */ -#define BTRFS_IGNORE_EXTENT_OFFSET ((u64)-1) +struct btrfs_backref_walk_ctx { + /* + * The address of the extent for which we are doing backref walking. + * Can be either a data extent or a metadata extent. + * + * Must always be set by the top level caller. + */ + u64 bytenr; + /* + * Offset relative to the target extent. This is only used for data + * extents, and it's meaningful because we can have file extent items + * that point only to a section of a data extent ("bookend" extents), + * and we want to filter out any that don't point to a section of the + * data extent containing the given offset. + * + * Must always be set by the top level caller. + */ + u64 extent_item_pos; + /* + * If true and bytenr corresponds to a data extent, then references from + * all file extent items that point to the data extent are considered, + * @extent_item_pos is ignored. + */ + bool ignore_extent_item_pos; + /* A valid transaction handle or NULL. */ + struct btrfs_trans_handle *trans; + /* + * The file system's info object, can not be NULL. + * + * Must always be set by the top level caller. + */ + struct btrfs_fs_info *fs_info; + /* + * Time sequence acquired from btrfs_get_tree_mod_seq(), in case the + * caller joined the tree mod log to get a consistent view of b+trees + * while we do backref walking, or BTRFS_SEQ_LAST. + * When using BTRFS_SEQ_LAST, delayed refs are not checked and it uses + * commit roots when searching b+trees - this is a special case for + * qgroups used during a transaction commit. + */ + u64 time_seq; + /* + * Used to collect the bytenr of metadata extents that point to the + * target extent. + */ + struct ulist *refs; + /* + * List used to collect the IDs of the roots from which the target + * extent is accessible. Can be NULL in case the caller does not care + * about collecting root IDs. + */ + struct ulist *roots; +}; struct inode_fs_paths { struct btrfs_path *btrfs_path; @@ -96,10 +148,9 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, struct btrfs_key *key, struct btrfs_extent_item *ei, u32 item_size, u64 *out_root, u8 *out_level); -int iterate_extent_inodes(struct btrfs_fs_info *fs_info, - u64 extent_item_objectid, - u64 extent_offset, int search_commit_root, - iterate_extent_inodes_t *iterate, void *ctx); +int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, + bool search_commit_root, + iterate_extent_inodes_t *iterate, void *user_ctx); int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, struct btrfs_path *path, void *ctx, @@ -107,13 +158,8 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); -int btrfs_find_all_leafs(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **leafs, - u64 extent_item_pos); -int btrfs_find_all_roots(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, u64 bytenr, - u64 time_seq, struct ulist **roots, +int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx); +int btrfs_find_all_roots(struct btrfs_backref_walk_ctx *ctx, bool skip_commit_root_sem); char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, u32 name_len, unsigned long name_off, diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 34e8acf02227..35856ea28e32 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1794,8 +1794,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info, int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, struct btrfs_qgroup_extent_record *qrecord) { - struct ulist *old_root; - u64 bytenr = qrecord->bytenr; + struct btrfs_backref_walk_ctx ctx = { 0 }; int ret; /* @@ -1822,8 +1821,10 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, if (trans->fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING) return 0; - ret = btrfs_find_all_roots(NULL, trans->fs_info, bytenr, 0, &old_root, - true); + ctx.bytenr = qrecord->bytenr; + ctx.fs_info = trans->fs_info; + + ret = btrfs_find_all_roots(&ctx, true); if (ret < 0) { qgroup_mark_inconsistent(trans->fs_info); btrfs_warn(trans->fs_info, @@ -1839,7 +1840,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, * * So modifying qrecord->old_roots is safe here */ - qrecord->old_roots = old_root; + qrecord->old_roots = ctx.roots; return 0; } @@ -2750,17 +2751,22 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) if (!ret && !(fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)) { + struct btrfs_backref_walk_ctx ctx = { 0 }; + + ctx.bytenr = record->bytenr; + ctx.fs_info = fs_info; + /* * Old roots should be searched when inserting qgroup * extent record */ if (WARN_ON(!record->old_roots)) { /* Search commit root to find old_roots */ - ret = btrfs_find_all_roots(NULL, fs_info, - record->bytenr, 0, - &record->old_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret < 0) goto cleanup; + record->old_roots = ctx.roots; + ctx.roots = NULL; } /* Free the reserved data space */ @@ -2773,10 +2779,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) * which doesn't lock tree or delayed_refs and search * current root. It's safe inside commit_transaction(). */ - ret = btrfs_find_all_roots(trans, fs_info, - record->bytenr, BTRFS_SEQ_LAST, &new_roots, false); + ctx.trans = trans; + ret = btrfs_find_all_roots(&ctx, false); if (ret < 0) goto cleanup; + new_roots = ctx.roots; if (qgroup_to_skip) { ulist_del(new_roots, qgroup_to_skip, 0); ulist_del(record->old_roots, qgroup_to_skip, @@ -3249,7 +3256,6 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root; struct btrfs_key found; struct extent_buffer *scratch_leaf = NULL; - struct ulist *roots = NULL; u64 num_bytes; bool done; int slot; @@ -3299,6 +3305,8 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, mutex_unlock(&fs_info->qgroup_rescan_lock); for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) { + struct btrfs_backref_walk_ctx ctx = { 0 }; + btrfs_item_key_to_cpu(scratch_leaf, &found, slot); if (found.type != BTRFS_EXTENT_ITEM_KEY && found.type != BTRFS_METADATA_ITEM_KEY) @@ -3308,13 +3316,15 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, else num_bytes = found.offset; - ret = btrfs_find_all_roots(NULL, fs_info, found.objectid, 0, - &roots, false); + ctx.bytenr = found.objectid; + ctx.fs_info = fs_info; + + ret = btrfs_find_all_roots(&ctx, false); if (ret < 0) goto out; /* For rescan, just pass old_roots as NULL */ ret = btrfs_qgroup_account_extent(trans, found.objectid, - num_bytes, NULL, roots); + num_bytes, NULL, ctx.roots); if (ret < 0) goto out; } diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 45690f7b5900..2ecca24e1001 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3408,24 +3408,27 @@ int add_data_references(struct reloc_control *rc, struct btrfs_path *path, struct rb_root *blocks) { - struct btrfs_fs_info *fs_info = rc->extent_root->fs_info; - struct ulist *leaves = NULL; + struct btrfs_backref_walk_ctx ctx = { 0 }; struct ulist_iterator leaf_uiter; struct ulist_node *ref_node = NULL; - const u32 blocksize = fs_info->nodesize; + const u32 blocksize = rc->extent_root->fs_info->nodesize; int ret = 0; btrfs_release_path(path); - ret = btrfs_find_all_leafs(NULL, fs_info, extent_key->objectid, - 0, &leaves, BTRFS_IGNORE_EXTENT_OFFSET); + + ctx.bytenr = extent_key->objectid; + ctx.ignore_extent_item_pos = true; + ctx.fs_info = rc->extent_root->fs_info; + + ret = btrfs_find_all_leafs(&ctx); if (ret < 0) return ret; ULIST_ITER_INIT(&leaf_uiter); - while ((ref_node = ulist_next(leaves, &leaf_uiter))) { + while ((ref_node = ulist_next(ctx.refs, &leaf_uiter))) { struct extent_buffer *eb; - eb = read_tree_block(fs_info, ref_node->val, 0, 0, 0, NULL); + eb = read_tree_block(ctx.fs_info, ref_node->val, 0, 0, 0, NULL); if (IS_ERR(eb)) { ret = PTR_ERR(eb); break; @@ -3441,7 +3444,7 @@ int add_data_references(struct reloc_control *rc, } if (ret < 0) free_block_list(blocks); - ulist_free(leaves); + ulist_free(ctx.refs); return ret; } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index dc178f59f34a..06c6626eae3d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -909,7 +909,6 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) struct btrfs_extent_item *ei; struct scrub_warning swarn; unsigned long ptr = 0; - u64 extent_item_pos; u64 flags = 0; u64 ref_root; u32 item_size; @@ -941,7 +940,6 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) if (ret < 0) goto out; - extent_item_pos = swarn.logical - found_key.objectid; swarn.extent_item_size = found_key.offset; eb = path->nodes[0]; @@ -964,12 +962,18 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) } while (ret != 1); btrfs_release_path(path); } else { + struct btrfs_backref_walk_ctx ctx = { 0 }; + btrfs_release_path(path); + + ctx.bytenr = found_key.objectid; + ctx.extent_item_pos = swarn.logical - found_key.objectid; + ctx.fs_info = fs_info; + swarn.path = path; swarn.dev = dev; - iterate_extent_inodes(fs_info, found_key.objectid, - extent_item_pos, 1, - scrub_print_warning_inode, &swarn); + + iterate_extent_inodes(&ctx, true, scrub_print_warning_inode, &swarn); } out: diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index b6d9200af2e3..95eaa9ce190b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1356,12 +1356,12 @@ static int find_extent_clone(struct send_ctx *sctx, u64 logical; u64 disk_byte; u64 num_bytes; - u64 extent_item_pos; u64 extent_refs; u64 flags = 0; struct btrfs_file_extent_item *fi; struct extent_buffer *eb = path->nodes[0]; - struct backref_ctx backref_ctx = {0}; + struct backref_ctx backref_ctx = { 0 }; + struct btrfs_backref_walk_ctx backref_walk_ctx = { 0 }; struct clone_root *cur_clone_root; struct btrfs_key found_key; struct btrfs_path *tmp_path; @@ -1461,14 +1461,13 @@ static int find_extent_clone(struct send_ctx *sctx, /* * Now collect all backrefs. */ + backref_walk_ctx.bytenr = found_key.objectid; if (compressed == BTRFS_COMPRESS_NONE) - extent_item_pos = logical - found_key.objectid; - else - extent_item_pos = 0; - ret = iterate_extent_inodes(fs_info, found_key.objectid, - extent_item_pos, 1, __iterate_backrefs, - &backref_ctx); + backref_walk_ctx.extent_item_pos = logical - found_key.objectid; + backref_walk_ctx.fs_info = fs_info; + ret = iterate_extent_inodes(&backref_walk_ctx, true, __iterate_backrefs, + &backref_ctx); if (ret < 0) goto out; diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index 65b65d55d1f6..3fc8dc3fd980 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -205,6 +205,7 @@ static int remove_extent_ref(struct btrfs_root *root, u64 bytenr, static int test_no_shared_qgroup(struct btrfs_root *root, u32 sectorsize, u32 nodesize) { + struct btrfs_backref_walk_ctx ctx = { 0 }; struct btrfs_trans_handle trans; struct btrfs_fs_info *fs_info = root->fs_info; struct ulist *old_roots = NULL; @@ -220,16 +221,22 @@ static int test_no_shared_qgroup(struct btrfs_root *root, return ret; } + ctx.bytenr = nodesize; + ctx.trans = &trans; + ctx.fs_info = fs_info; + /* * Since the test trans doesn't have the complicated delayed refs, * we can only call btrfs_qgroup_account_extent() directly to test * quota. */ - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { test_err("couldn't find old roots: %d", ret); return ret; } + old_roots = ctx.roots; + ctx.roots = NULL; ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); @@ -238,12 +245,14 @@ static int test_no_shared_qgroup(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } + new_roots = ctx.roots; + ctx.roots = NULL; ret = btrfs_qgroup_account_extent(&trans, nodesize, nodesize, old_roots, new_roots); @@ -262,11 +271,13 @@ static int test_no_shared_qgroup(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { test_err("couldn't find old roots: %d", ret); return ret; } + old_roots = ctx.roots; + ctx.roots = NULL; ret = remove_extent_item(root, nodesize, nodesize); if (ret) { @@ -274,12 +285,14 @@ static int test_no_shared_qgroup(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } + new_roots = ctx.roots; + ctx.roots = NULL; ret = btrfs_qgroup_account_extent(&trans, nodesize, nodesize, old_roots, new_roots); @@ -304,6 +317,7 @@ static int test_no_shared_qgroup(struct btrfs_root *root, static int test_multiple_refs(struct btrfs_root *root, u32 sectorsize, u32 nodesize) { + struct btrfs_backref_walk_ctx ctx = { 0 }; struct btrfs_trans_handle trans; struct btrfs_fs_info *fs_info = root->fs_info; struct ulist *old_roots = NULL; @@ -324,11 +338,17 @@ static int test_multiple_refs(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); + ctx.bytenr = nodesize; + ctx.trans = &trans; + ctx.fs_info = fs_info; + + ret = btrfs_find_all_roots(&ctx, false); if (ret) { test_err("couldn't find old roots: %d", ret); return ret; } + old_roots = ctx.roots; + ctx.roots = NULL; ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); @@ -337,12 +357,14 @@ static int test_multiple_refs(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } + new_roots = ctx.roots; + ctx.roots = NULL; ret = btrfs_qgroup_account_extent(&trans, nodesize, nodesize, old_roots, new_roots); @@ -357,11 +379,13 @@ static int test_multiple_refs(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { test_err("couldn't find old roots: %d", ret); return ret; } + old_roots = ctx.roots; + ctx.roots = NULL; ret = add_tree_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); @@ -370,12 +394,14 @@ static int test_multiple_refs(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } + new_roots = ctx.roots; + ctx.roots = NULL; ret = btrfs_qgroup_account_extent(&trans, nodesize, nodesize, old_roots, new_roots); @@ -396,11 +422,13 @@ static int test_multiple_refs(struct btrfs_root *root, return -EINVAL; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &old_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { test_err("couldn't find old roots: %d", ret); return ret; } + old_roots = ctx.roots; + ctx.roots = NULL; ret = remove_extent_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); @@ -409,12 +437,14 @@ static int test_multiple_refs(struct btrfs_root *root, return ret; } - ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); + ret = btrfs_find_all_roots(&ctx, false); if (ret) { ulist_free(old_roots); test_err("couldn't find old roots: %d", ret); return ret; } + new_roots = ctx.roots; + ctx.roots = NULL; ret = btrfs_qgroup_account_extent(&trans, nodesize, nodesize, old_roots, new_roots); From patchwork Tue Nov 1 16:15:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C5C8C43217 for ; Tue, 1 Nov 2022 16:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230392AbiKAQQP (ORCPT ); Tue, 1 Nov 2022 12:16:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230406AbiKAQQM (ORCPT ); Tue, 1 Nov 2022 12:16:12 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 701411C909 for ; Tue, 1 Nov 2022 09:16:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2A147B81E25 for ; Tue, 1 Nov 2022 16:16:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59DCAC433C1 for ; Tue, 1 Nov 2022 16:16:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319368; bh=iSyxUGwEWB16etuwUo0C5aZdvGPQe5EuB0rQR4yO10g=; h=From:To:Subject:Date:In-Reply-To:References:From; b=VqAiimckiyThB5Q26RjEYTbnnhHChm3Etduht6P0tvhhyMGxKeP1lu/KsA2sygY78 NICs0qoX1gbfFcZ78OF44WZPi7L9a/zbUohzWvhY4YMQqU2HRb2g8qVs+zqDum3kuM uh7DBimKRlZLh2UxCrwLxG5j+WmXQsA+VsdOMSJJEBiTqSeFox/HzbV7ygh9j9nKQB yCvD+mRCIis/vNNOFZTZZS7uy0xthSl7ZLOot8zQiTUXpXh4MQqsP52Yfv+Hd7xCUn MwAOOR9NM3i5sM/N2VQjO7qLK5P957qjpeOMGZi30uwmAn23qSd5gksbmwi9OtGOJN /udnNk5W0sEWA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/18] btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() Date: Tue, 1 Nov 2022 16:15:48 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At iterate_extent_inodes() we collect a ulist of leaves for a given extent with a call to btrfs_find_all_leafs() and then we enter a loop where we iterate over all the collected leaves. Each iteration of that loop does a call to btrfs_find_all_roots_safe(), to determine all roots from which a leaf is accessible, and that results in allocating and releasing a ulist to store the root IDs. Instead of allocating and releasing the roots ulist on every iteration, allocate a ulist before entering the loop and keep using it on each iteration, reinitializing the ulist at the end of each iteration. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index a1a00a7bdba5..dc276ce3afe0 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1674,32 +1674,36 @@ int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx) * the current while iterating. The process stops when we reach the end of the * list. * - * Found roots are added to @ctx->roots, which is allocated by this function and - * @ctx->roots should be NULL when calling this function. This function also - * requires @ctx->refs to be NULL, as it uses it for allocating a ulist to do - * temporary work, and frees it before returning. + * Found roots are added to @ctx->roots, which is allocated by this function if + * it points to NULL, in which case the caller is responsible for freeing it + * after it's not needed anymore. + * This function requires @ctx->refs to be NULL, as it uses it for allocating a + * ulist to do temporary work, and frees it before returning. * - * Returns 0 on success, < 0 on error. On error @ctx->roots is always NULL. + * Returns 0 on success, < 0 on error. */ static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx) { const u64 orig_bytenr = ctx->bytenr; const bool orig_ignore_extent_item_pos = ctx->ignore_extent_item_pos; + bool roots_ulist_allocated = false; struct ulist_iterator uiter; int ret = 0; ASSERT(ctx->refs == NULL); - ASSERT(ctx->roots == NULL); ctx->refs = ulist_alloc(GFP_NOFS); if (!ctx->refs) return -ENOMEM; - ctx->roots = ulist_alloc(GFP_NOFS); if (!ctx->roots) { - ulist_free(ctx->refs); - ctx->refs = NULL; - return -ENOMEM; + ctx->roots = ulist_alloc(GFP_NOFS); + if (!ctx->roots) { + ulist_free(ctx->refs); + ctx->refs = NULL; + return -ENOMEM; + } + roots_ulist_allocated = true; } ctx->ignore_extent_item_pos = true; @@ -1710,8 +1714,10 @@ static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx) ret = find_parent_nodes(ctx, NULL); if (ret < 0 && ret != -ENOENT) { - ulist_free(ctx->roots); - ctx->roots = NULL; + if (roots_ulist_allocated) { + ulist_free(ctx->roots); + ctx->roots = NULL; + } break; } ret = 0; @@ -2295,6 +2301,11 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, ctx->bytenr); ASSERT(ctx->trans == NULL); + ASSERT(ctx->roots == NULL); + + ctx->roots = ulist_alloc(GFP_NOFS); + if (!ctx->roots) + return -ENOMEM; if (!search_commit_root) { struct btrfs_trans_handle *trans; @@ -2302,8 +2313,11 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, trans = btrfs_attach_transaction(ctx->fs_info->tree_root); if (IS_ERR(trans)) { if (PTR_ERR(trans) != -ENOENT && - PTR_ERR(trans) != -EROFS) + PTR_ERR(trans) != -EROFS) { + ulist_free(ctx->roots); + ctx->roots = NULL; return PTR_ERR(trans); + } trans = NULL; } ctx->trans = trans; @@ -2344,8 +2358,7 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, root_node->val, ctx->bytenr, iterate, user_ctx); } - ulist_free(ctx->roots); - ctx->roots = NULL; + ulist_reinit(ctx->roots); } free_leaf_list(refs); @@ -2358,6 +2371,9 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, up_read(&ctx->fs_info->commit_root_sem); } + ulist_free(ctx->roots); + ctx->roots = NULL; + return ret; } From patchwork Tue Nov 1 16:15:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027166 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B881C43219 for ; Tue, 1 Nov 2022 16:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230415AbiKAQQO (ORCPT ); Tue, 1 Nov 2022 12:16:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230384AbiKAQQL (ORCPT ); Tue, 1 Nov 2022 12:16:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B37D91C90E for ; Tue, 1 Nov 2022 09:16:10 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4FD956118E for ; Tue, 1 Nov 2022 16:16:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E867C43144 for ; Tue, 1 Nov 2022 16:16:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319369; bh=a+oHdRyfanntEWk8r2LPgSO+Ilr1llMwhkoFklk6sUk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NdQP+7+WQt77zoPIvc/wAW6yQx194aK3wnfdQBlqzeV/iUMKBWZaceF388zyzifEP VlaXvsZj69OWB4LyJaHheiLHldsE+hb3hdwJOQA9NA5oF+nREuSWCdxzWewUK5WGmG BqxKVADOz0BW5A+Taf14GXW38MkvrW9p/fRDSXUeijTVeaGOj31IooRQKmBZeTEH+O l5P1HaPfc0OJOn0i+tXZ0pm4wsmxipwWMycVgqEXo8GMjukk9B//j+RUaih10TAZEq v6hfgkSCNGoTAvfDc94bSakDOJyZbDas12PMeZAfVpIKO02nJf/c3WKlisSvnFvuPh kBsaxa9QBh1eg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 13/18] btrfs: constify ulist parameter of ulist_next() Date: Tue, 1 Nov 2022 16:15:49 +0000 Message-Id: <056c766f0411fceba7880ae20c68d9b4d16edbb1.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The ulist_next() iterator function does not need to change the given ulist so make it const. This will allow the next patch in the series to pass a ulist to a function that does not need, and should not, modify the ulist. Signed-off-by: Filipe Manana --- fs/btrfs/ulist.c | 2 +- fs/btrfs/ulist.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c index 13af1e41f9d7..33606025513d 100644 --- a/fs/btrfs/ulist.c +++ b/fs/btrfs/ulist.c @@ -266,7 +266,7 @@ int ulist_del(struct ulist *ulist, u64 val, u64 aux) * It is allowed to call ulist_add during an enumeration. Newly added items * are guaranteed to show up in the running enumeration. */ -struct ulist_node *ulist_next(struct ulist *ulist, struct ulist_iterator *uiter) +struct ulist_node *ulist_next(const struct ulist *ulist, struct ulist_iterator *uiter) { struct ulist_node *node; diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h index 02fda0a2d4ce..b2cef187ea8e 100644 --- a/fs/btrfs/ulist.h +++ b/fs/btrfs/ulist.h @@ -66,7 +66,7 @@ static inline int ulist_add_merge_ptr(struct ulist *ulist, u64 val, void *aux, #endif } -struct ulist_node *ulist_next(struct ulist *ulist, +struct ulist_node *ulist_next(const struct ulist *ulist, struct ulist_iterator *uiter); #define ULIST_ITER_INIT(uiter) ((uiter)->cur_list = NULL) From patchwork Tue Nov 1 16:15:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027168 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0FFDC433FE for ; Tue, 1 Nov 2022 16:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230428AbiKAQQP (ORCPT ); Tue, 1 Nov 2022 12:16:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230409AbiKAQQN (ORCPT ); Tue, 1 Nov 2022 12:16:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6AB01C92F for ; Tue, 1 Nov 2022 09:16:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3841F61668 for ; Tue, 1 Nov 2022 16:16:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24B00C43470 for ; Tue, 1 Nov 2022 16:16:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319370; bh=nDVWlKOx6XfmieL9u0wV9XrPdsLzAadVK+5xFdL7KpE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ZLhSFcsmzqTOfUsTxOQfm/aS16mBOiV67s0AFeECBVGwIhZoZx3OXtZjCsEd/ED1q B3fjxiij94xxGRDGmB4gyIk8rlfkJ/LwCGl6jTOiMotRPb9Q+Cho3V5bbWjcHA1Arg 5jAfXOg2MFRGnnewaNWjgFV4Rs3dJWVCKu2o+sZI8MoYCiehcKiSCDYMV4rwvSFC7r emfROnEFAb4izCe67PynfYFXDwPQoV+V3QVuM0VmwHFR+8WLd+fQ4/P8Q3va4rbbPp 6VGzAZJszcD6bw+epPrZJUXYiQcrgTfGQ1VAhh2JNkxtIy9ZvfNfOispJ6leUmIJPl neWv76KK2WKTA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 14/18] btrfs: send: cache leaf to roots mapping during backref walking Date: Tue, 1 Nov 2022 16:15:50 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During a send operation, when doing backref walking to determine which inodes/offsets/roots we can clone from, the most repetitive and expensive step is to map each leaf that has file extent items pointing to the target data extent to the IDs of the roots from which the leaves are accessible, which happens at iterate_extent_inodes(). That step requires finding every parent node of a leaf, then the parent of each parent, and so on until we reach a root node. So it's a naturally expensive operation, and repetitive because each leaf can have hundreds of file extent items (for a nodesize of 16K, that can be slightly over 200 file extent items). There's also temporal locality, as we process all file extent items from a leave before moving the next leaf. This change caches the mapping of leaves to root IDs, to avoid repeating those computations over and over again. The cache is limited to a maximum of 128 entries, with each entry being a struct with a size of 128 bytes, so the maximum cache size is 16K plus any nodes internally allocated by the maple tree that is used to index pointers to those structs. The cache is invalidated whenever we detect relocation happened since we started filling the cache, because if relocation happened then extent buffers for leaves and nodes of the trees used by a send operation may have been reallocated. This cache also allows for another important optimization that is introduced in the next patch in the series. This change is part of a patchset comprised of the following patches: 01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs() 02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes() 03/17 btrfs: fix ulist leaks in error paths of qgroup self tests 04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests 05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone 06/17 btrfs: send: update comment at find_extent_clone() 07/17 btrfs: send: drop unnecessary backref context field initializations 08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source 09/17 btrfs: send: optimize clone detection to increase extent sharing 10/17 btrfs: use a single argument for extent offset in backref walking functions 11/17 btrfs: use a structure to pass arguments to backref walking functions 12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() 13/17 btrfs: constify ulist parameter of ulist_next() 14/17 btrfs: send: cache leaf to roots mapping during backref walking 15/17 btrfs: send: skip unnecessary backref iterations 16/17 btrfs: send: avoid double extent tree search when finding clone source 17/17 btrfs: send: skip resolution of our own backref when finding clone source Performance test results are in the changelog of patch 17/17. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 52 ++++++++++--- fs/btrfs/backref.h | 11 +++ fs/btrfs/send.c | 185 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 236 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index dc276ce3afe0..9dacf487017d 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -2303,21 +2303,14 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, ASSERT(ctx->trans == NULL); ASSERT(ctx->roots == NULL); - ctx->roots = ulist_alloc(GFP_NOFS); - if (!ctx->roots) - return -ENOMEM; - if (!search_commit_root) { struct btrfs_trans_handle *trans; trans = btrfs_attach_transaction(ctx->fs_info->tree_root); if (IS_ERR(trans)) { if (PTR_ERR(trans) != -ENOENT && - PTR_ERR(trans) != -EROFS) { - ulist_free(ctx->roots); - ctx->roots = NULL; + PTR_ERR(trans) != -EROFS) return PTR_ERR(trans); - } trans = NULL; } ctx->trans = trans; @@ -2338,23 +2331,58 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, ULIST_ITER_INIT(&ref_uiter); while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { + const u64 leaf_bytenr = ref_node->val; struct ulist_node *root_node; struct ulist_iterator root_uiter; + struct extent_inode_elem *inode_list; + + inode_list = (struct extent_inode_elem *)(uintptr_t)ref_node->aux; + + if (ctx->cache_lookup) { + const u64 *root_ids; + int root_count; + bool cached; + + cached = ctx->cache_lookup(leaf_bytenr, ctx->user_ctx, + &root_ids, &root_count); + if (cached) { + for (int i = 0; i < root_count; i++) { + ret = iterate_leaf_refs(ctx->fs_info, + inode_list, + root_ids[i], + leaf_bytenr, + iterate, + user_ctx); + if (ret) + break; + } + continue; + } + } + + if (!ctx->roots) { + ctx->roots = ulist_alloc(GFP_NOFS); + if (!ctx->roots) { + ret = -ENOMEM; + break; + } + } - ctx->bytenr = ref_node->val; + ctx->bytenr = leaf_bytenr; ret = btrfs_find_all_roots_safe(ctx); if (ret) break; + if (ctx->cache_store) + ctx->cache_store(leaf_bytenr, ctx->roots, ctx->user_ctx); + ULIST_ITER_INIT(&root_uiter); while (!ret && (root_node = ulist_next(ctx->roots, &root_uiter))) { btrfs_debug(ctx->fs_info, "root %llu references leaf %llu, data list %#llx", root_node->val, ref_node->val, ref_node->aux); - ret = iterate_leaf_refs(ctx->fs_info, - (struct extent_inode_elem *) - (uintptr_t)ref_node->aux, + ret = iterate_leaf_refs(ctx->fs_info, inode_list, root_node->val, ctx->bytenr, iterate, user_ctx); } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index ce700dfcc016..5005f579f235 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -69,6 +69,17 @@ struct btrfs_backref_walk_ctx { * about collecting root IDs. */ struct ulist *roots; + /* + * Used by iterate_extent_inodes(). Lookup and store functions for an + * optional cache which maps the logical address (bytenr) of leaves + * to an array of root IDs. + */ + bool (*cache_lookup)(u64 leaf_bytenr, void *user_ctx, + const u64 **root_ids_ret, int *root_count_ret); + void (*cache_store)(u64 leaf_bytenr, const struct ulist *root_ids, + void *user_ctx); + /* Context object to pass to @cache_lookup and @cache_store. */ + void *user_ctx; }; struct inode_fs_paths { diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 95eaa9ce190b..4b79d52d74a4 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -83,6 +83,39 @@ struct clone_root { #define SEND_CTX_MAX_NAME_CACHE_SIZE 128 #define SEND_CTX_NAME_CACHE_CLEAN_SIZE (SEND_CTX_MAX_NAME_CACHE_SIZE * 2) +/* + * Limit the root_ids array of struct backref_cache_entry to 12 elements. + * This makes the size of a cache entry to be exactly 128 bytes on x86_64. + * The most common case is to have a single root for cloning, which corresponds + * to the send root. Having the user specify more than 11 clone roots is not + * common, and in such rare cases we simply don't use caching if the number of + * cloning roots that lead down to a leaf is more than 12. + */ +#define SEND_MAX_BACKREF_CACHE_ROOTS 12 + +/* + * Max number of entries in the cache. + * With SEND_MAX_BACKREF_CACHE_ROOTS as 12, the size in bytes, excluding + * maple tree's internal nodes, is 16K. + */ +#define SEND_MAX_BACKREF_CACHE_SIZE 128 + +/* + * A backref cache entry maps a leaf to a list of IDs of roots from which the + * leaf is accessible and we can use for clone operations. + * With SEND_MAX_BACKREF_CACHE_ROOTS as 12, each cache entry is 128 bytes (on + * x86_64). + */ +struct backref_cache_entry { + /* List to link to the cache's lru list. */ + struct list_head list; + /* The key for this entry in the cache. */ + u64 key; + u64 root_ids[SEND_MAX_BACKREF_CACHE_ROOTS]; + /* Number of valid elements in the root_ids array. */ + int num_roots; +}; + struct send_ctx { struct file *send_filp; loff_t send_off; @@ -251,6 +284,14 @@ struct send_ctx { struct rb_root rbtree_new_refs; struct rb_root rbtree_deleted_refs; + + struct { + u64 last_reloc_trans; + struct list_head lru_list; + struct maple_tree entries; + /* Number of entries stored in the cache. */ + int size; + } backref_cache; }; struct pending_dir_move { @@ -1335,6 +1376,142 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root, return 0; } +static void empty_backref_cache(struct send_ctx *sctx) +{ + struct backref_cache_entry *entry; + struct backref_cache_entry *tmp; + + list_for_each_entry_safe(entry, tmp, &sctx->backref_cache.lru_list, list) + kfree(entry); + + INIT_LIST_HEAD(&sctx->backref_cache.lru_list); + mtree_destroy(&sctx->backref_cache.entries); + sctx->backref_cache.size = 0; +} + +static bool lookup_backref_cache(u64 leaf_bytenr, void *ctx, + const u64 **root_ids_ret, int *root_count_ret) +{ + struct send_ctx *sctx = ctx; + struct btrfs_fs_info *fs_info = sctx->send_root->fs_info; + const u64 key = leaf_bytenr >> fs_info->sectorsize_bits; + struct backref_cache_entry *entry; + + if (sctx->backref_cache.size == 0) + return false; + + /* + * If relocation happened since we first filled the cache, then we must + * empty the cache and can not use it, because even though we operate on + * read-only roots, their leaves and nodes may have been reallocated and + * now be used for different nodes/leaves of the same tree or some other + * tree. + * + * We are called from iterate_extent_inodes() while either holding a + * transaction handle or holding fs_info->commit_root_sem, so no need + * to take any lock here. + */ + if (fs_info->last_reloc_trans > sctx->backref_cache.last_reloc_trans) { + empty_backref_cache(sctx); + return false; + } + + entry = mtree_load(&sctx->backref_cache.entries, key); + if (!entry) + return false; + + *root_ids_ret = entry->root_ids; + *root_count_ret = entry->num_roots; + list_move_tail(&entry->list, &sctx->backref_cache.lru_list); + + return true; +} + +static void store_backref_cache(u64 leaf_bytenr, const struct ulist *root_ids, + void *ctx) +{ + struct send_ctx *sctx = ctx; + struct btrfs_fs_info *fs_info = sctx->send_root->fs_info; + struct backref_cache_entry *new_entry; + struct ulist_iterator uiter; + struct ulist_node *node; + int ret; + + /* + * We're called while holding a transaction handle or while holding + * fs_info->commit_root_sem (at iterate_extent_inodes()), so must do a + * NOFS allocation. + */ + new_entry = kmalloc(sizeof(struct backref_cache_entry), GFP_NOFS); + /* No worries, cache is optional. */ + if (!new_entry) + return; + + new_entry->key = leaf_bytenr >> fs_info->sectorsize_bits; + new_entry->num_roots = 0; + ULIST_ITER_INIT(&uiter); + while ((node = ulist_next(root_ids, &uiter)) != NULL) { + const u64 root_id = node->val; + struct clone_root *root; + + root = bsearch((void *)(uintptr_t)root_id, sctx->clone_roots, + sctx->clone_roots_cnt, sizeof(struct clone_root), + __clone_root_cmp_bsearch); + if (!root) + continue; + + /* Too many roots, just exit, no worries as caching is optional. */ + if (new_entry->num_roots >= SEND_MAX_BACKREF_CACHE_ROOTS) { + kfree(new_entry); + return; + } + + new_entry->root_ids[new_entry->num_roots] = root_id; + new_entry->num_roots++; + } + + /* + * We may have not added any roots to the new cache entry, which means + * none of the roots is part of the list of roots from which we are + * allowed to clone. Cache the new entry as it's still useful to avoid + * backref walking to determine which roots have a path to the leaf. + */ + + if (sctx->backref_cache.size >= SEND_MAX_BACKREF_CACHE_SIZE) { + struct backref_cache_entry *lru_entry; + struct backref_cache_entry *mt_entry; + + lru_entry = list_first_entry(&sctx->backref_cache.lru_list, + struct backref_cache_entry, list); + mt_entry = mtree_erase(&sctx->backref_cache.entries, lru_entry->key); + ASSERT(mt_entry == lru_entry); + list_del(&mt_entry->list); + kfree(mt_entry); + sctx->backref_cache.size--; + } + + ret = mtree_insert(&sctx->backref_cache.entries, new_entry->key, + new_entry, GFP_NOFS); + ASSERT(ret == 0 || ret == -ENOMEM); + if (ret) { + /* Caching is optional, no worries. */ + kfree(new_entry); + return; + } + + list_add_tail(&new_entry->list, &sctx->backref_cache.lru_list); + + /* + * We are called from iterate_extent_inodes() while either holding a + * transaction handle or holding fs_info->commit_root_sem, so no need + * to take any lock here. + */ + if (sctx->backref_cache.size == 0) + sctx->backref_cache.last_reloc_trans = fs_info->last_reloc_trans; + + sctx->backref_cache.size++; +} + /* * Given an inode, offset and extent item, it finds a good clone for a clone * instruction. Returns -ENOENT when none could be found. The function makes @@ -1465,6 +1642,9 @@ static int find_extent_clone(struct send_ctx *sctx, if (compressed == BTRFS_COMPRESS_NONE) backref_walk_ctx.extent_item_pos = logical - found_key.objectid; backref_walk_ctx.fs_info = fs_info; + backref_walk_ctx.cache_lookup = lookup_backref_cache; + backref_walk_ctx.cache_store = store_backref_cache; + backref_walk_ctx.user_ctx = sctx; ret = iterate_extent_inodes(&backref_walk_ctx, true, __iterate_backrefs, &backref_ctx); @@ -7940,6 +8120,9 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) sctx->rbtree_new_refs = RB_ROOT; sctx->rbtree_deleted_refs = RB_ROOT; + INIT_LIST_HEAD(&sctx->backref_cache.lru_list); + mt_init(&sctx->backref_cache.entries); + sctx->clone_roots = kvcalloc(sizeof(*sctx->clone_roots), arg->clone_sources_count + 1, GFP_KERNEL); @@ -8131,6 +8314,8 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) close_current_inode(sctx); + empty_backref_cache(sctx); + kfree(sctx); } From patchwork Tue Nov 1 16:15:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12E58C4321E for ; Tue, 1 Nov 2022 16:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230458AbiKAQQU (ORCPT ); Tue, 1 Nov 2022 12:16:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230409AbiKAQQP (ORCPT ); Tue, 1 Nov 2022 12:16:15 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BCDE1C92F for ; Tue, 1 Nov 2022 09:16:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D343FB81E25 for ; Tue, 1 Nov 2022 16:16:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D2F9C433B5 for ; Tue, 1 Nov 2022 16:16:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319371; bh=H+BsPAqv3KDCz5W1k+HIjPia0EjPq3Y9u6nC6xSaVE8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ueQwz8UgvQDvug9yhN+UVXEpFOzMd91PiOHX9xdLG9t4CZ11j17ercoOH0XGBRtvD loAmsYc66hYpZaC46YT1VXPHiM1Hz4jA4GMfP9XIya0w3T98Ed8b1p2MtdXIyeBiKw F5BF2Kn+ZYT5IjlIV1BQh8wvP4uZ0W4W/17JewXVlHO1wxyYr7qyhYsSJJf7oRAsKb Bih8eEc0me/DnQd3Y5Zwx2+jxXD1YkkyHPBZ9L+C7VTTU1w25SkAyLaf8UwEAYEN1O JW6VprYIu3y3mxzLcVX7Iyj8zd5U2ZOJOdIi8Djy4shYga170XI+TjT3QUDBZp7gYv yJUXggXcjKhkQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 15/18] btrfs: send: skip unnecessary backref iterations Date: Tue, 1 Nov 2022 16:15:51 +0000 Message-Id: <9bb6ab50fe0cd7c0f7c5837c386dbedee54ff3a8.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When looking for a clone source for an extent, we are iterating over all the backreferences for an extent. This is often a waste of time, because once we find a good clone source we could stop immediately instead of continuing backref walking, which is expensive. Basically what happens currently is this: 1) Call iterate_extent_inodes() to iterate over all the backreferences; 2) It calls btrfs_find_all_leafs() which in turn calls the main function to walk over backrefs and collect them - find_parent_nodes(); 3) Then we collect all the references for our target data extent from the extent tree (and delayed refs if any), add them to the rb trees, resolve all the indirect backreferences and search for all the file extent items in fs trees, building a list of inodes for each one of them (struct extent_inode_elem); 4) Then back at iterate_extent_inodes() we find all the roots associated to each found leaf, and call the callback __iterate_backrefs defined at send.c for each inode in the inode list associated to each leaf. Some times one the first backreferences we find in a fs tree is optimal to satisfy the clone operation that send wants to perform, and in that case we could stop immediately and avoid resolving all the remaining indirect backreferences (search fs trees for the respective file extent items, etc). This possibly if when we find a fs tree leaf with a file extent item we are able to know what are all the roots that can lead to the leaf - this is now possible after the previous patch in the series that adds a cache that maps leaves to a list of roots. So we can now shortcircuit backref walking during send, by having the callback we pass to iterate_extent_inodes() to be called when we find a file extent item for an indirect backreference, and have it return a special value when it found a suitable backreference and it does not need to look for more backreferences. This change does that. This change is part of a patchset comprised of the following patches: 01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs() 02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes() 03/17 btrfs: fix ulist leaks in error paths of qgroup self tests 04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests 05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone 06/17 btrfs: send: update comment at find_extent_clone() 07/17 btrfs: send: drop unnecessary backref context field initializations 08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source 09/17 btrfs: send: optimize clone detection to increase extent sharing 10/17 btrfs: use a single argument for extent offset in backref walking functions 11/17 btrfs: use a structure to pass arguments to backref walking functions 12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() 13/17 btrfs: constify ulist parameter of ulist_next() 14/17 btrfs: send: cache leaf to roots mapping during backref walking 15/17 btrfs: send: skip unnecessary backref iterations 16/17 btrfs: send: avoid double extent tree search when finding clone source 17/17 btrfs: send: skip resolution of our own backref when finding clone source Performance test results are in the changelog of patch 17/17. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 73 ++++++++++++++++++++++++++++------------- fs/btrfs/backref.h | 44 +++++++++++++++++++++---- fs/btrfs/send.c | 81 +++++++++++++++++++++++----------------------- 3 files changed, 128 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 9dacf487017d..bb59ba68d7ee 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -31,15 +31,18 @@ struct extent_inode_elem { struct extent_inode_elem *next; }; -static int check_extent_in_eb(const struct btrfs_key *key, +static int check_extent_in_eb(struct btrfs_backref_walk_ctx *ctx, + const struct btrfs_key *key, const struct extent_buffer *eb, const struct btrfs_file_extent_item *fi, - u64 extent_item_pos, struct extent_inode_elem **eie) { const u64 data_len = btrfs_file_extent_num_bytes(eb, fi); - u64 offset = 0; + u64 offset = key->offset; struct extent_inode_elem *e; + const u64 *root_ids; + int root_count; + bool cached; if (!btrfs_file_extent_compression(eb, fi) && !btrfs_file_extent_encryption(eb, fi) && @@ -48,19 +51,38 @@ static int check_extent_in_eb(const struct btrfs_key *key, data_offset = btrfs_file_extent_offset(eb, fi); - if (extent_item_pos < data_offset || - extent_item_pos >= data_offset + data_len) + if (ctx->extent_item_pos < data_offset || + ctx->extent_item_pos >= data_offset + data_len) return 1; - offset = extent_item_pos - data_offset; + offset += ctx->extent_item_pos - data_offset; + } + + if (!ctx->indirect_ref_iterator || !ctx->cache_lookup) + goto add_inode_elem; + + cached = ctx->cache_lookup(eb->start, ctx->user_ctx, &root_ids, + &root_count); + if (!cached) + goto add_inode_elem; + + for (int i = 0; i < root_count; i++) { + int ret; + + ret = ctx->indirect_ref_iterator(key->objectid, offset, + data_len, root_ids[i], + ctx->user_ctx); + if (ret) + return ret; } +add_inode_elem: e = kmalloc(sizeof(*e), GFP_NOFS); if (!e) return -ENOMEM; e->next = *eie; e->inum = key->objectid; - e->offset = key->offset + offset; + e->offset = offset; e->num_bytes = data_len; *eie = e; @@ -77,8 +99,8 @@ static void free_inode_elem_list(struct extent_inode_elem *eie) } } -static int find_extent_in_eb(const struct extent_buffer *eb, - u64 wanted_disk_byte, u64 extent_item_pos, +static int find_extent_in_eb(struct btrfs_backref_walk_ctx *ctx, + const struct extent_buffer *eb, struct extent_inode_elem **eie) { u64 disk_byte; @@ -105,11 +127,11 @@ static int find_extent_in_eb(const struct extent_buffer *eb, continue; /* don't skip BTRFS_FILE_EXTENT_PREALLOC, we can handle that */ disk_byte = btrfs_file_extent_disk_bytenr(eb, fi); - if (disk_byte != wanted_disk_byte) + if (disk_byte != ctx->bytenr) continue; - ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie); - if (ret < 0) + ret = check_extent_in_eb(ctx, &key, eb, fi, eie); + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || ret < 0) return ret; } @@ -526,9 +548,9 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx, else goto next; if (!ctx->ignore_extent_item_pos) { - ret = check_extent_in_eb(&key, eb, fi, - ctx->extent_item_pos, &eie); - if (ret < 0) + ret = check_extent_in_eb(ctx, &key, eb, fi, &eie); + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || + ret < 0) break; } if (ret > 0) @@ -551,10 +573,11 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx, ret = btrfs_next_old_item(root, path, ctx->time_seq); } - if (ret > 0) - ret = 0; - else if (ret < 0) + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || ret < 0) free_inode_elem_list(eie); + else if (ret > 0) + ret = 0; + return ret; } @@ -1570,12 +1593,12 @@ static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx, if (!path->skip_locking) btrfs_tree_read_lock(eb); - ret = find_extent_in_eb(eb, ctx->bytenr, - ctx->extent_item_pos, &eie); + ret = find_extent_in_eb(ctx, eb, &eie); if (!path->skip_locking) btrfs_tree_read_unlock(eb); free_extent_buffer(eb); - if (ret < 0) + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || + ret < 0) goto out; ref->inode_list = eie; /* @@ -1628,7 +1651,7 @@ static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx, prelim_release(&preftrees.indirect); prelim_release(&preftrees.indirect_missing_keys); - if (ret < 0) + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || ret < 0) free_inode_elem_list(eie); return ret; } @@ -1654,7 +1677,8 @@ int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx) return -ENOMEM; ret = find_parent_nodes(ctx, NULL); - if (ret < 0 && ret != -ENOENT) { + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP || + (ret < 0 && ret != -ENOENT)) { free_leaf_list(ctx->refs); ctx->refs = NULL; return ret; @@ -2402,6 +2426,9 @@ int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx, ulist_free(ctx->roots); ctx->roots = NULL; + if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP) + ret = 0; + return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 5005f579f235..a80d1e8be718 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -12,6 +12,25 @@ #include "disk-io.h" #include "extent_io.h" +/* + * Used by implementations of iterate_extent_inodes_t (see definition below) to + * signal that backref iteration can stop immediately and no error happened. + * The value must be non-negative and must not be 0, 1 (which is a common return + * value from things like btrfs_search_slot() and used internally in the backref + * walking code) and different from BACKREF_FOUND_SHARED and + * BACKREF_FOUND_NOT_SHARED + */ +#define BTRFS_ITERATE_EXTENT_INODES_STOP 5 + +/* + * Should return 0 if no errors happened and iteration of backrefs should + * continue. Can return BTRFS_ITERATE_EXTENT_INODES_STOP or any other non-zero + * value to immediately stop iteration and possibly signal an error back to + * the caller. + */ +typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 num_bytes, + u64 root, void *ctx); + /* * Context and arguments for backref walking functions. Some of the fields are * to be filled by the caller of such functions while other are filled by the @@ -70,15 +89,29 @@ struct btrfs_backref_walk_ctx { */ struct ulist *roots; /* - * Used by iterate_extent_inodes(). Lookup and store functions for an - * optional cache which maps the logical address (bytenr) of leaves - * to an array of root IDs. + * Used by iterate_extent_inodes() and the main backref walk code + * (find_parent_nodes()). Lookup and store functions for an optional + * cache which maps the logical address (bytenr) of leaves to an array + * of root IDs. */ bool (*cache_lookup)(u64 leaf_bytenr, void *user_ctx, const u64 **root_ids_ret, int *root_count_ret); void (*cache_store)(u64 leaf_bytenr, const struct ulist *root_ids, void *user_ctx); - /* Context object to pass to @cache_lookup and @cache_store. */ + /* + * If this is not NULL, then the backref walking code will call this + * for each indirect data extent reference as soon as it finds one, + * before collecting all the remaining backrefs and before resolving + * indirect backrefs. This allows for the caller to terminate backref + * walking as soon as it finds one backref that matches some specific + * criteria. The @cache_lookup and @cache_store callbacks should not + * be NULL in order to use this callback. + */ + iterate_extent_inodes_t *indirect_ref_iterator; + /* + * Context object to pass to the @cache_lookup, @cache_store and + * @indirect_ref_iterator callbacks. + */ void *user_ctx; }; @@ -145,9 +178,6 @@ struct btrfs_backref_share_check_ctx { int prev_extents_cache_slot; }; -typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 num_bytes, - u64 root, void *ctx); - struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void); void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx); diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 4b79d52d74a4..67f82793315c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -77,7 +77,7 @@ struct clone_root { u64 ino; u64 offset; u64 num_bytes; - u64 found_refs; + bool found_ref; }; #define SEND_CTX_MAX_NAME_CACHE_SIZE 128 @@ -1281,9 +1281,6 @@ struct backref_ctx { /* may be truncated in case it's the last extent in a file */ u64 extent_len; - - /* Just to check for bugs in backref resolving */ - int found_itself; }; static int __clone_root_cmp_bsearch(const void *key, const void *elt) @@ -1312,33 +1309,33 @@ static int __clone_root_cmp_sort(const void *e1, const void *e2) /* * Called for every backref that is found for the current extent. - * Results are collected in sctx->clone_roots->ino/offset/found_refs + * Results are collected in sctx->clone_roots->ino/offset. */ -static int __iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root, - void *ctx_) +static int iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root_id, + void *ctx_) { struct backref_ctx *bctx = ctx_; - struct clone_root *found; + struct clone_root *clone_root; /* First check if the root is in the list of accepted clone sources */ - found = bsearch((void *)(uintptr_t)root, bctx->sctx->clone_roots, - bctx->sctx->clone_roots_cnt, - sizeof(struct clone_root), - __clone_root_cmp_bsearch); - if (!found) + clone_root = bsearch((void *)(uintptr_t)root_id, bctx->sctx->clone_roots, + bctx->sctx->clone_roots_cnt, + sizeof(struct clone_root), + __clone_root_cmp_bsearch); + if (!clone_root) return 0; - if (found->root == bctx->sctx->send_root && + /* This is our own reference, bail out as we can't clone from it. */ + if (clone_root->root == bctx->sctx->send_root && ino == bctx->cur_objectid && - offset == bctx->cur_offset) { - bctx->found_itself = 1; - } + offset == bctx->cur_offset) + return 0; /* * Make sure we don't consider clones from send_root that are * behind the current inode/offset. */ - if (found->root == bctx->sctx->send_root) { + if (clone_root->root == bctx->sctx->send_root) { /* * If the source inode was not yet processed we can't issue a * clone operation, as the source extent does not exist yet at @@ -1359,7 +1356,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root, } bctx->found++; - found->found_refs++; + clone_root->found_ref = true; /* * If the given backref refers to a file extent item with a larger @@ -1367,10 +1364,17 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 num_bytes, u64 root, * we clone more optimally and end up doing less writes and getting * less exclusive, non-shared extents at the destination. */ - if (num_bytes > found->num_bytes) { - found->ino = ino; - found->offset = offset; - found->num_bytes = num_bytes; + if (num_bytes > clone_root->num_bytes) { + clone_root->ino = ino; + clone_root->offset = offset; + clone_root->num_bytes = num_bytes; + + /* + * Found a perfect candidate, so there's no need to continue + * backref walking. + */ + if (num_bytes >= bctx->extent_len) + return BTRFS_ITERATE_EXTENT_INODES_STOP; } return 0; @@ -1392,7 +1396,8 @@ static void empty_backref_cache(struct send_ctx *sctx) static bool lookup_backref_cache(u64 leaf_bytenr, void *ctx, const u64 **root_ids_ret, int *root_count_ret) { - struct send_ctx *sctx = ctx; + struct backref_ctx *bctx = ctx; + struct send_ctx *sctx = bctx->sctx; struct btrfs_fs_info *fs_info = sctx->send_root->fs_info; const u64 key = leaf_bytenr >> fs_info->sectorsize_bits; struct backref_cache_entry *entry; @@ -1430,7 +1435,8 @@ static bool lookup_backref_cache(u64 leaf_bytenr, void *ctx, static void store_backref_cache(u64 leaf_bytenr, const struct ulist *root_ids, void *ctx) { - struct send_ctx *sctx = ctx; + struct backref_ctx *bctx = ctx; + struct send_ctx *sctx = bctx->sctx; struct btrfs_fs_info *fs_info = sctx->send_root->fs_info; struct backref_cache_entry *new_entry; struct ulist_iterator uiter; @@ -1618,7 +1624,7 @@ static int find_extent_clone(struct send_ctx *sctx, cur_clone_root->ino = (u64)-1; cur_clone_root->offset = 0; cur_clone_root->num_bytes = 0; - cur_clone_root->found_refs = 0; + cur_clone_root->found_ref = false; } backref_ctx.sctx = sctx; @@ -1628,7 +1634,7 @@ static int find_extent_clone(struct send_ctx *sctx, /* * The last extent of a file may be too large due to page alignment. * We need to adjust extent_len in this case so that the checks in - * __iterate_backrefs work. + * iterate_backrefs() work. */ if (data_offset + num_bytes >= ino_size) backref_ctx.extent_len = ino_size - data_offset; @@ -1644,9 +1650,10 @@ static int find_extent_clone(struct send_ctx *sctx, backref_walk_ctx.fs_info = fs_info; backref_walk_ctx.cache_lookup = lookup_backref_cache; backref_walk_ctx.cache_store = store_backref_cache; - backref_walk_ctx.user_ctx = sctx; + backref_walk_ctx.indirect_ref_iterator = iterate_backrefs; + backref_walk_ctx.user_ctx = &backref_ctx; - ret = iterate_extent_inodes(&backref_walk_ctx, true, __iterate_backrefs, + ret = iterate_extent_inodes(&backref_walk_ctx, true, iterate_backrefs, &backref_ctx); if (ret < 0) goto out; @@ -1671,27 +1678,21 @@ static int find_extent_clone(struct send_ctx *sctx, } up_read(&fs_info->commit_root_sem); - if (!backref_ctx.found_itself) { - /* found a bug in backref code? */ - ret = -EIO; - btrfs_err(fs_info, - "did not find backref in send_root. inode=%llu, offset=%llu, disk_byte=%llu found extent=%llu", - ino, data_offset, disk_byte, found_key.objectid); - goto out; - } - btrfs_debug(fs_info, "find_extent_clone: data_offset=%llu, ino=%llu, num_bytes=%llu, logical=%llu", data_offset, ino, num_bytes, logical); - if (!backref_ctx.found) + if (!backref_ctx.found) { btrfs_debug(fs_info, "no clones found"); + ret = -ENOENT; + goto out; + } cur_clone_root = NULL; for (i = 0; i < sctx->clone_roots_cnt; i++) { struct clone_root *clone_root = &sctx->clone_roots[i]; - if (clone_root->found_refs == 0) + if (!clone_root->found_ref) continue; /* From patchwork Tue Nov 1 16:15:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C028C43219 for ; Tue, 1 Nov 2022 16:16:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230452AbiKAQQR (ORCPT ); Tue, 1 Nov 2022 12:16:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230416AbiKAQQP (ORCPT ); Tue, 1 Nov 2022 12:16:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 761A51C93C for ; Tue, 1 Nov 2022 09:16:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 05B23611DA for ; Tue, 1 Nov 2022 16:16:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6EA0C43470 for ; Tue, 1 Nov 2022 16:16:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319372; bh=6epLISEFfyI08/+Qeoi+bykpHG6wNUweJJcKWNKrMV8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Aq6XvGB7eki/Gi1YyXrxaAKg2293RoZl91ZLylvSgn2miqP0MCc+knnEkCYGUmnAN gid6x7MA20OcjNZ7XqIYh8LWmfYZ6gjC35Tffq3G4nYvzcBOQls/AEn42AnomyRsJS xkzyEL2mwe1qH6IVqpQoUu+6iH5+IBdVcalz4CzlhHT/qY3MsDPjVSd41kT8d/pBTQ 3PoSudwNTuKfbdf6WFzKsr2W3sgGSPWQZIPEdzsHJjDMpMh/SzLU1fBxizdp6Axh7C 5i6O2WY/5ebXwoMMFBr2IGG/xBgr/9D+jz4w+xJBs5F/bFBPtyjdyr6+ia8hQ/JsmI RaMLCCZo5FVxg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 16/18] btrfs: send: avoid double extent tree search when finding clone source Date: Tue, 1 Nov 2022 16:15:52 +0000 Message-Id: <62dff1edae3e169e7f7095a65a3af87bb6c4099c.1667315100.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At find_extent_clone() we search twice for the extent item corresponding to the data extent that the current file extent items points to: 1) Once with a call to extent_from_logical(); 2) Once again during backref walking, through iterate_extent_inodes() which eventually leads to find_parent_nodes() where we will search again the extent tree for the same extent item. The extent tree can be huge, so doing this one extra search for every extent we want to send adds up and it's expensive. The first call is there since the send code was introduced and it accomplishes two things: 1) Check that the extent is flagged as a data extent in the extent tree. But it can not be anything else, otherwise we wouldn't have a file extent item in the send root pointing to it. This was probably added to catch bugs in the early days where send was yet too young and the interaction with everything else was far from perfect; 2) Check how many direct references there are on the extent, and if there's too many (more than SEND_MAX_EXTENT_REFS), avoid doing the backred walking as it may take too long and slowdown send. So improve on this by having a callback in the backref walking code that is called when it finds the extent item in the extent tree, and have those checks done in the callback. When the callback returns anything different from 0, it stops the backref walking code. This way we do a single search on the extent tree for the extent item of our data extent. Also, before this change we were only checking the number of references on the data extent against SEND_MAX_EXTENT_REFS, but after starting backref walking we will end up resolving backrefs for extent buffers in the path from a leaf having a file extent item pointing to our data extent, up to roots of trees from which the extent buffer is accessible from, due to shared subtrees resulting from snapshoting. We were therefore allowing for the possibility for send taking too long due to some node in the path from the leaf to a root node being shared too many times. After this change we check for reference counts being greater than SEND_MAX_EXTENT_REFS for both data extents and metadata extents. This change is part of a patchset comprised of the following patches: 01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs() 02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes() 03/17 btrfs: fix ulist leaks in error paths of qgroup self tests 04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests 05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone 06/17 btrfs: send: update comment at find_extent_clone() 07/17 btrfs: send: drop unnecessary backref context field initializations 08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source 09/17 btrfs: send: optimize clone detection to increase extent sharing 10/17 btrfs: use a single argument for extent offset in backref walking functions 11/17 btrfs: use a structure to pass arguments to backref walking functions 12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() 13/17 btrfs: constify ulist parameter of ulist_next() 14/17 btrfs: send: cache leaf to roots mapping during backref walking 15/17 btrfs: send: skip unnecessary backref iterations 16/17 btrfs: send: avoid double extent tree search when finding clone source 17/17 btrfs: send: skip resolution of our own backref when finding clone source Performance test results are in the changelog of patch 17/17. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 31 ++++++++------ fs/btrfs/backref.h | 9 +++- fs/btrfs/send.c | 103 +++++++++++++++++++++------------------------ 3 files changed, 73 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index bb59ba68d7ee..33056c4c0528 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1003,8 +1003,8 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, * * Returns 0 on success, <0 on error, or BACKREF_FOUND_SHARED. */ -static int add_inline_refs(const struct btrfs_fs_info *fs_info, - struct btrfs_path *path, u64 bytenr, +static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, + struct btrfs_path *path, int *info_level, struct preftrees *preftrees, struct share_check *sc) { @@ -1029,6 +1029,13 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, BUG_ON(item_size < sizeof(*ei)); ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + + if (ctx->check_extent_item) { + ret = ctx->check_extent_item(ctx->bytenr, ei, leaf, ctx->user_ctx); + if (ret) + return ret; + } + flags = btrfs_extent_flags(leaf, ei); btrfs_item_key_to_cpu(leaf, &found_key, slot); @@ -1064,9 +1071,9 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, switch (type) { case BTRFS_SHARED_BLOCK_REF_KEY: - ret = add_direct_ref(fs_info, preftrees, + ret = add_direct_ref(ctx->fs_info, preftrees, *info_level + 1, offset, - bytenr, 1, NULL, GFP_NOFS); + ctx->bytenr, 1, NULL, GFP_NOFS); break; case BTRFS_SHARED_DATA_REF_KEY: { struct btrfs_shared_data_ref *sdref; @@ -1075,14 +1082,14 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, sdref = (struct btrfs_shared_data_ref *)(iref + 1); count = btrfs_shared_data_ref_count(leaf, sdref); - ret = add_direct_ref(fs_info, preftrees, 0, offset, - bytenr, count, sc, GFP_NOFS); + ret = add_direct_ref(ctx->fs_info, preftrees, 0, offset, + ctx->bytenr, count, sc, GFP_NOFS); break; } case BTRFS_TREE_BLOCK_REF_KEY: - ret = add_indirect_ref(fs_info, preftrees, offset, + ret = add_indirect_ref(ctx->fs_info, preftrees, offset, NULL, *info_level + 1, - bytenr, 1, NULL, GFP_NOFS); + ctx->bytenr, 1, NULL, GFP_NOFS); break; case BTRFS_EXTENT_DATA_REF_KEY: { struct btrfs_extent_data_ref *dref; @@ -1104,8 +1111,8 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, root = btrfs_extent_data_ref_root(leaf, dref); - ret = add_indirect_ref(fs_info, preftrees, root, - &key, 0, bytenr, count, + ret = add_indirect_ref(ctx->fs_info, preftrees, root, + &key, 0, ctx->bytenr, count, sc, GFP_NOFS); break; @@ -1455,8 +1462,8 @@ static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx, if (key.objectid == ctx->bytenr && (key.type == BTRFS_EXTENT_ITEM_KEY || key.type == BTRFS_METADATA_ITEM_KEY)) { - ret = add_inline_refs(ctx->fs_info, path, ctx->bytenr, - &info_level, &preftrees, sc); + ret = add_inline_refs(ctx, path, &info_level, + &preftrees, sc); if (ret) goto out; ret = add_keyed_refs(root, path, ctx->bytenr, info_level, diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index a80d1e8be718..1bd5a15c7f9e 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -109,9 +109,14 @@ struct btrfs_backref_walk_ctx { */ iterate_extent_inodes_t *indirect_ref_iterator; /* - * Context object to pass to the @cache_lookup, @cache_store and - * @indirect_ref_iterator callbacks. + * If this is not NULL, then the backref walking code will call this for + * each extent item it's meant to process before it actually starts + * processing it. If this returns anything other than 0, then it stops + * the backref walking code immediately. */ + int (*check_extent_item)(u64 bytenr, const struct btrfs_extent_item *ei, + const struct extent_buffer *leaf, void *user_ctx); + /* Context object to pass to the callbacks defined above. */ void *user_ctx; }; diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 67f82793315c..f91cc95a0a3b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1281,6 +1281,9 @@ struct backref_ctx { /* may be truncated in case it's the last extent in a file */ u64 extent_len; + + /* The bytenr the file extent item we are processing refers to. */ + u64 bytenr; }; static int __clone_root_cmp_bsearch(const void *key, const void *elt) @@ -1518,6 +1521,43 @@ static void store_backref_cache(u64 leaf_bytenr, const struct ulist *root_ids, sctx->backref_cache.size++; } +static int check_extent_item(u64 bytenr, const struct btrfs_extent_item *ei, + const struct extent_buffer *leaf, void *ctx) +{ + const u64 refs = btrfs_extent_refs(leaf, ei); + const struct backref_ctx *bctx = ctx; + const struct send_ctx *sctx = bctx->sctx; + + if (bytenr == bctx->bytenr) { + const u64 flags = btrfs_extent_flags(leaf, ei); + + if (WARN_ON(flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)) + return -EUCLEAN; + + /* + * If we have only one reference and only the send root as a + * clone source - meaning no clone roots were given in the + * struct btrfs_ioctl_send_args passed to the send ioctl - then + * it's our reference and there's no point in doing backref + * walking which is expensive, so exit early. + */ + if (refs == 1 && sctx->clone_roots_cnt == 1) + return -ENOENT; + } + + /* + * Backreference walking (iterate_extent_inodes() below) is currently + * too expensive when an extent has a large number of references, both + * in time spent and used memory. So for now just fallback to write + * operations instead of clone operations when an extent has more than + * a certain amount of references. + */ + if (refs > SEND_MAX_EXTENT_REFS) + return -ENOENT; + + return 0; +} + /* * Given an inode, offset and extent item, it finds a good clone for a clone * instruction. Returns -ENOENT when none could be found. The function makes @@ -1539,16 +1579,11 @@ static int find_extent_clone(struct send_ctx *sctx, u64 logical; u64 disk_byte; u64 num_bytes; - u64 extent_refs; - u64 flags = 0; struct btrfs_file_extent_item *fi; struct extent_buffer *eb = path->nodes[0]; struct backref_ctx backref_ctx = { 0 }; struct btrfs_backref_walk_ctx backref_walk_ctx = { 0 }; struct clone_root *cur_clone_root; - struct btrfs_key found_key; - struct btrfs_path *tmp_path; - struct btrfs_extent_item *ei; int compressed; u32 i; @@ -1574,48 +1609,6 @@ static int find_extent_clone(struct send_ctx *sctx, num_bytes = btrfs_file_extent_num_bytes(eb, fi); logical = disk_byte + btrfs_file_extent_offset(eb, fi); - tmp_path = alloc_path_for_send(); - if (!tmp_path) - return -ENOMEM; - - /* We only use this path under the commit sem */ - tmp_path->need_commit_sem = 0; - - down_read(&fs_info->commit_root_sem); - ret = extent_from_logical(fs_info, disk_byte, tmp_path, - &found_key, &flags); - up_read(&fs_info->commit_root_sem); - - if (ret < 0) - goto out; - if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { - ret = -EIO; - goto out; - } - - ei = btrfs_item_ptr(tmp_path->nodes[0], tmp_path->slots[0], - struct btrfs_extent_item); - extent_refs = btrfs_extent_refs(tmp_path->nodes[0], ei); - /* - * Backreference walking (iterate_extent_inodes() below) is currently - * too expensive when an extent has a large number of references, both - * in time spent and used memory. So for now just fallback to write - * operations instead of clone operations when an extent has more than - * a certain amount of references. - * - * Also, if we have only one reference and only the send root as a clone - * source - meaning no clone roots were given in the struct - * btrfs_ioctl_send_args passed to the send ioctl - then it's our - * reference and there's no point in doing backref walking which is - * expensive, so exit early. - */ - if ((extent_refs == 1 && sctx->clone_roots_cnt == 1) || - extent_refs > SEND_MAX_EXTENT_REFS) { - ret = -ENOENT; - goto out; - } - btrfs_release_path(tmp_path); - /* * Setup the clone roots. */ @@ -1630,6 +1623,7 @@ static int find_extent_clone(struct send_ctx *sctx, backref_ctx.sctx = sctx; backref_ctx.cur_objectid = ino; backref_ctx.cur_offset = data_offset; + backref_ctx.bytenr = disk_byte; /* * The last extent of a file may be too large due to page alignment. @@ -1644,19 +1638,20 @@ static int find_extent_clone(struct send_ctx *sctx, /* * Now collect all backrefs. */ - backref_walk_ctx.bytenr = found_key.objectid; + backref_walk_ctx.bytenr = disk_byte; if (compressed == BTRFS_COMPRESS_NONE) - backref_walk_ctx.extent_item_pos = logical - found_key.objectid; + backref_walk_ctx.extent_item_pos = btrfs_file_extent_offset(eb, fi); backref_walk_ctx.fs_info = fs_info; backref_walk_ctx.cache_lookup = lookup_backref_cache; backref_walk_ctx.cache_store = store_backref_cache; backref_walk_ctx.indirect_ref_iterator = iterate_backrefs; + backref_walk_ctx.check_extent_item = check_extent_item; backref_walk_ctx.user_ctx = &backref_ctx; ret = iterate_extent_inodes(&backref_walk_ctx, true, iterate_backrefs, &backref_ctx); if (ret < 0) - goto out; + return ret; down_read(&fs_info->commit_root_sem); if (fs_info->last_reloc_trans > sctx->last_reloc_trans) { @@ -1673,8 +1668,7 @@ static int find_extent_clone(struct send_ctx *sctx, * was already reallocated after the relocation. */ up_read(&fs_info->commit_root_sem); - ret = -ENOENT; - goto out; + return -ENOENT; } up_read(&fs_info->commit_root_sem); @@ -1684,8 +1678,7 @@ static int find_extent_clone(struct send_ctx *sctx, if (!backref_ctx.found) { btrfs_debug(fs_info, "no clones found"); - ret = -ENOENT; - goto out; + return -ENOENT; } cur_clone_root = NULL; @@ -1720,8 +1713,6 @@ static int find_extent_clone(struct send_ctx *sctx, ret = -ENOENT; } -out: - btrfs_free_path(tmp_path); return ret; } From patchwork Tue Nov 1 16:15:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 686C7C433FE for ; Tue, 1 Nov 2022 16:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230459AbiKAQQT (ORCPT ); Tue, 1 Nov 2022 12:16:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230424AbiKAQQP (ORCPT ); Tue, 1 Nov 2022 12:16:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B5E51C92A for ; Tue, 1 Nov 2022 09:16:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DBB046165E for ; Tue, 1 Nov 2022 16:16:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC9FBC433D6 for ; Tue, 1 Nov 2022 16:16:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319373; bh=GHbBMqokfNgIn3anPdP1PkxCGuJWX50FiVgFWFtH9nI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eKE102pfWrErQijJ2GCk2lXVwtFHCL6VDE8/2d5gQ5Cgmjw6VkSfFSYsPe6N6vw/r T/42OY2ribeMkMP+DooNj9ciyngDrIpFRBW7MTRXPc9HBPpb/DQUQOKIR9KwLaFN15 58rZXjLMvwzbHh84cxacac4h8qo6fmtRbvCy5zP+6zi0GhhTCRYYth9cZbAoj4U6Sk KoCZZz10U7sNaD2rUBKW44+srCbLH388dQM5SXQj5pLbPW2aBE/StUaBSciUaUBt1o 22DAuz5znnyAX1Q6QGI89+XQDNpZCFYkQdnWfDg8WXbgtzYAtpmdFUOSDaxn5jZ7e1 AUEJP1gE9lViQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 17/18] btrfs: send: skip resolution of our own backref when finding clone source Date: Tue, 1 Nov 2022 16:15:53 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When doing backref walking to determine a source range to clone from, it is worthless to collect and resolve our own data backref, as we can't obviously use it as a clone source and it represents the range we want to clone into. Collecting the backref implies doing the extra work to resolve it, doing the search for a file extent item in a subvolume tree, etc. Skipping the data backref is valid as long as we only have the send root as the single clone root, otherwise the leaf with the file extent item may be accessible from another clone root due to shared subtrees created by snapshots, and therefore we have to collect the backref and resolve it. So add a callback to the backref walking code to guide it to skip data backrefs. This change is part of a patchset comprised of the following patches: 01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs() 02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes() 03/17 btrfs: fix ulist leaks in error paths of qgroup self tests 04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests 05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone 06/17 btrfs: send: update comment at find_extent_clone() 07/17 btrfs: send: drop unnecessary backref context field initializations 08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source 09/17 btrfs: send: optimize clone detection to increase extent sharing 10/17 btrfs: use a single argument for extent offset in backref walking functions 11/17 btrfs: use a structure to pass arguments to backref walking functions 12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() 13/17 btrfs: constify ulist parameter of ulist_next() 14/17 btrfs: send: cache leaf to roots mapping during backref walking 15/17 btrfs: send: skip unnecessary backref iterations 16/17 btrfs: send: avoid double extent tree search when finding clone source 17/17 btrfs: send: skip resolution of our own backref when finding clone source The following test was run on non-debug kernel (Debian's default kernel config) before and after applying the patchset: $ cat test-send-many-shared-extents.sh #!/bin/bash DEV=/dev/sdh MNT=/mnt/sdh umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount $DEV $MNT num_files=50000 num_clones_per_file=50 for ((i = 1; i <= $num_files; i++)); do xfs_io -f -c "pwrite 0 64K" $MNT/file_$i > /dev/null echo -ne "\r$i files created..." done echo btrfs subvolume snapshot -r $MNT $MNT/snap1 cloned=0 for ((i = 1; i <= $num_clones_per_file; i++)); do for ((j = 1; j <= $num_files; j++)); do cp --reflink=always $MNT/file_$j $MNT/file_${j}_clone_${i} cloned=$((cloned + 1)) echo -ne "\r$cloned / $((num_files * num_clones_per_file)) clone operations" done done echo btrfs subvolume snapshot -r $MNT $MNT/snap2 # Unmount and mount again to clear all cached metadata (and data). umount $DEV mount $DEV $MNT start=$(date +%s%N) btrfs send $MNT/snap2 > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000000 )) echo -e "\nFull send took $dur seconds" # Unmount and mount again to clear all cached metadata (and data). umount $DEV mount $DEV $MNT start=$(date +%s%N) btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null end=$(date +%s%N) dur=$(( (end - start) / 1000000000 )) echo -e "\nIncremental send took $dur seconds" umount $MNT Before applying the patchset: (...) Full send took 1108 seconds (...) Incremental send took 1135 seconds After applying the whole patchset: (...) Full send took 268 seconds (-75.8%) (...) Incremental send took 316 seconds (-72.2%) Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 35 +++++++++++++++++++++-------------- fs/btrfs/backref.h | 9 +++++++++ fs/btrfs/send.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 33056c4c0528..430974cf3b96 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1111,10 +1111,12 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, root = btrfs_extent_data_ref_root(leaf, dref); - ret = add_indirect_ref(ctx->fs_info, preftrees, root, - &key, 0, ctx->bytenr, count, - sc, GFP_NOFS); - + if (!ctx->skip_data_ref || + !ctx->skip_data_ref(root, key.objectid, key.offset, + ctx->user_ctx)) + ret = add_indirect_ref(ctx->fs_info, preftrees, + root, &key, 0, ctx->bytenr, + count, sc, GFP_NOFS); break; } default: @@ -1133,8 +1135,9 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, * * Returns 0 on success, <0 on error, or BACKREF_FOUND_SHARED. */ -static int add_keyed_refs(struct btrfs_root *extent_root, - struct btrfs_path *path, u64 bytenr, +static int add_keyed_refs(struct btrfs_backref_walk_ctx *ctx, + struct btrfs_root *extent_root, + struct btrfs_path *path, int info_level, struct preftrees *preftrees, struct share_check *sc) { @@ -1157,7 +1160,7 @@ static int add_keyed_refs(struct btrfs_root *extent_root, leaf = path->nodes[0]; btrfs_item_key_to_cpu(leaf, &key, slot); - if (key.objectid != bytenr) + if (key.objectid != ctx->bytenr) break; if (key.type < BTRFS_TREE_BLOCK_REF_KEY) continue; @@ -1169,7 +1172,7 @@ static int add_keyed_refs(struct btrfs_root *extent_root, /* SHARED DIRECT METADATA backref */ ret = add_direct_ref(fs_info, preftrees, info_level + 1, key.offset, - bytenr, 1, NULL, GFP_NOFS); + ctx->bytenr, 1, NULL, GFP_NOFS); break; case BTRFS_SHARED_DATA_REF_KEY: { /* SHARED DIRECT FULL backref */ @@ -1180,14 +1183,14 @@ static int add_keyed_refs(struct btrfs_root *extent_root, struct btrfs_shared_data_ref); count = btrfs_shared_data_ref_count(leaf, sdref); ret = add_direct_ref(fs_info, preftrees, 0, - key.offset, bytenr, count, + key.offset, ctx->bytenr, count, sc, GFP_NOFS); break; } case BTRFS_TREE_BLOCK_REF_KEY: /* NORMAL INDIRECT METADATA backref */ ret = add_indirect_ref(fs_info, preftrees, key.offset, - NULL, info_level + 1, bytenr, + NULL, info_level + 1, ctx->bytenr, 1, NULL, GFP_NOFS); break; case BTRFS_EXTENT_DATA_REF_KEY: { @@ -1211,9 +1214,13 @@ static int add_keyed_refs(struct btrfs_root *extent_root, } root = btrfs_extent_data_ref_root(leaf, dref); - ret = add_indirect_ref(fs_info, preftrees, root, - &key, 0, bytenr, count, - sc, GFP_NOFS); + + if (!ctx->skip_data_ref || + !ctx->skip_data_ref(root, key.objectid, key.offset, + ctx->user_ctx)) + ret = add_indirect_ref(fs_info, preftrees, root, + &key, 0, ctx->bytenr, + count, sc, GFP_NOFS); break; } default: @@ -1466,7 +1473,7 @@ static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx, &preftrees, sc); if (ret) goto out; - ret = add_keyed_refs(root, path, ctx->bytenr, info_level, + ret = add_keyed_refs(ctx, root, path, info_level, &preftrees, sc); if (ret) goto out; diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 1bd5a15c7f9e..ef6bbea3f456 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -116,6 +116,15 @@ struct btrfs_backref_walk_ctx { */ int (*check_extent_item)(u64 bytenr, const struct btrfs_extent_item *ei, const struct extent_buffer *leaf, void *user_ctx); + /* + * If this is not NULL, then the backref walking code will call this for + * each extent data ref it finds (BTRFS_EXTENT_DATA_REF_KEY keys) before + * processing that data ref. If this callback return false, then it will + * ignore this data ref and it will never resolve the indirect data ref, + * saving time searching for leaves in a fs tree with file extent items + * matching the data ref. + */ + bool (*skip_data_ref)(u64 root, u64 ino, u64 offset, void *user_ctx); /* Context object to pass to the callbacks defined above. */ void *user_ctx; }; diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index f91cc95a0a3b..1bcbe386a24b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1284,6 +1284,10 @@ struct backref_ctx { /* The bytenr the file extent item we are processing refers to. */ u64 bytenr; + /* The owner (root id) of the data backref for the current extent. */ + u64 backref_owner; + /* The offset of the data backref for the current extent. */ + u64 backref_offset; }; static int __clone_root_cmp_bsearch(const void *key, const void *elt) @@ -1558,6 +1562,18 @@ static int check_extent_item(u64 bytenr, const struct btrfs_extent_item *ei, return 0; } +static bool skip_self_data_ref(u64 root, u64 ino, u64 offset, void *ctx) +{ + const struct backref_ctx *bctx = ctx; + + if (ino == bctx->cur_objectid && + root == bctx->backref_owner && + offset == bctx->backref_offset) + return true; + + return false; +} + /* * Given an inode, offset and extent item, it finds a good clone for a clone * instruction. Returns -ENOENT when none could be found. The function makes @@ -1624,6 +1640,12 @@ static int find_extent_clone(struct send_ctx *sctx, backref_ctx.cur_objectid = ino; backref_ctx.cur_offset = data_offset; backref_ctx.bytenr = disk_byte; + /* + * Use the header owner and not the send root's id, because in case of a + * snapshot we can have shared subtrees. + */ + backref_ctx.backref_owner = btrfs_header_owner(eb); + backref_ctx.backref_offset = data_offset - btrfs_file_extent_offset(eb, fi); /* * The last extent of a file may be too large due to page alignment. @@ -1648,6 +1670,17 @@ static int find_extent_clone(struct send_ctx *sctx, backref_walk_ctx.check_extent_item = check_extent_item; backref_walk_ctx.user_ctx = &backref_ctx; + /* + * If have a single clone root, then it's the send root and we can tell + * the backref walking code to skip our own backref and not resolve it, + * since we can not use it for cloning - the source and destination + * ranges can't overlap and in case the leaf is shared through a subtree + * due to snapshots, we can't use those other roots since they are not + * in the list of clone roots. + */ + if (sctx->clone_roots_cnt == 1) + backref_walk_ctx.skip_data_ref = skip_self_data_ref; + ret = iterate_extent_inodes(&backref_walk_ctx, true, iterate_backrefs, &backref_ctx); if (ret < 0) From patchwork Tue Nov 1 16:15:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13027172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F31A5C43217 for ; Tue, 1 Nov 2022 16:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbiKAQQU (ORCPT ); Tue, 1 Nov 2022 12:16:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbiKAQQS (ORCPT ); Tue, 1 Nov 2022 12:16:18 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DCEA1C92F for ; Tue, 1 Nov 2022 09:16:17 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8F794B81E6C for ; Tue, 1 Nov 2022 16:16:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B21E8C433C1 for ; Tue, 1 Nov 2022 16:16:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667319374; bh=tCqVCr31IoLZDNDBYNZprNyEgJm3UyVqNWvFZBh4Uws=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ujH9fwRv1Ugsf1nMcgZgapG7MY2leOkQIgzn+99AD0Pv5gmmv+67GEbIBQkgUl3uo zXKVWuvFN7UgCywJCU5qycwCeeG5AVjLEOoKoEwMZRTn2gkRmCGp69vs46nfvn100G qmmd9paZdm+bL0o7uGN54MyJ4KV3+iPd7zQcDFe9Lmrm8xPl24OntLzeidvXkB/Oek wuRjMaTf6UmnyhR7Qw/8OLHNgsAvlwUOs8xLxc2TqNOnyS9NuSN0nUXgCYvpVTFCku c84s4Mgx5LIDavktbDoIGSlKxBBx+ZHyx5hFIZMAVs+MpjRBFwR9w3FDND7bC75Nh0 Xh2TEur8ix1gA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 18/18] btrfs: send: bump the extent reference count limit for backref walking Date: Tue, 1 Nov 2022 16:15:54 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana After the previous patchset which is comprised of the following patches: 01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs() 02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes() 03/17 btrfs: fix ulist leaks in error paths of qgroup self tests 04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests 05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone 06/17 btrfs: send: update comment at find_extent_clone() 07/17 btrfs: send: drop unnecessary backref context field initializations 08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source 09/17 btrfs: send: optimize clone detection to increase extent sharing 10/17 btrfs: use a single argument for extent offset in backref walking functions 11/17 btrfs: use a structure to pass arguments to backref walking functions 12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() 13/17 btrfs: constify ulist parameter of ulist_next() 14/17 btrfs: send: cache leaf to roots mapping during backref walking 15/17 btrfs: send: skip unnecessary backref iterations 16/17 btrfs: send: avoid double extent tree search when finding clone source 17/17 btrfs: send: skip resolution of our own backref when finding clone source we have now much better performance when doing backref walking in the send code, so we can increase the current limit from 64 to 1024 references. This limit is still a bit conservative because there are still edge cases where backref walking will be too slow and spend a lot of cpu time, some IO reading b+tree nodes/leaves and memory. The goal is to eventually get rid of any limit, but for now bump it as it benefits users with extents shared more than 64 times and up to 1024 times, allowing for more deduplication at the destination without having to run a dedupe tool after a receive. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 1bcbe386a24b..6950d3f9cbc1 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -39,7 +39,7 @@ * avoid hitting limitations of the backreference walking code (taking a lot of * time and using too much memory for extents with large number of references). */ -#define SEND_MAX_EXTENT_REFS 64 +#define SEND_MAX_EXTENT_REFS 1024 /* * A fs_path is a helper to dynamically build path names with unknown size.