From patchwork Mon Oct 10 10:22:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D22D4C433FE for ; Mon, 10 Oct 2022 10:22:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231709AbiJJKWf (ORCPT ); Mon, 10 Oct 2022 06:22:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231555AbiJJKW1 (ORCPT ); Mon, 10 Oct 2022 06:22:27 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF6F36AEA9 for ; Mon, 10 Oct 2022 03:22:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 21FC560EB1 for ; Mon, 10 Oct 2022 10:22:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F984C433D6 for ; Mon, 10 Oct 2022 10:22:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397344; bh=9hY7j+XMiQtgukI4CuGUX/VxaSxrnWX+lBYbBL+CydA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=dwfo03TEeCAcv+y2V/yIAi6ikwOvmwlwdvqgDFbVVwb9div7z/Rr4H3Blxdlz454T aSrfnIsLGr78MMbYq+9Ypl2ANmI+UM83HYZJkZtdqkS4ZRCf68+786CEqbWLdoWdry +5584mVpkpz2gwbggaDY0zUGSlIPnlDyuiRxHMLEsjzpGB1ybaq+cGesXQdOQc0wZR dcxtyx5XsK3UpmOMjAuPWwsruwo+0djlFXwX0gT7gOKCAOFkzYVZnTih7q5A7vuTJA dVIHC01U7hn/ZY61THN7Hr7ItZlXki21XL0600m/nkbdD19d8e5YKiZuaKe1Biha2Q rszcKR75/SsUw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 01/18] btrfs: fix processing of delayed tree block refs during backref walking Date: Mon, 10 Oct 2022 11:22:03 +0100 Message-Id: <37ea29070f26f59765d91cdfcdf1d8fd414b6be1.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During backref walking, when processing a delayed reference with a type of BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there: 1) We are accessing the delayed references extent_op, and its key, without the protection of the delayed ref head's lock; 2) If there's no extent op for the delayed ref head, we end up with an uninitialized key in the stack, variable 'tmp_op_key', and then pass it to add_indirect_ref(), which adds the reference to the indirect refs rb tree. This is wrong, because indirect references should have a NULL key when we don't have access to the key, and in that case they should be added to the indirect_missing_keys rb tree and not to the indirect rb tree. This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting from freeing an extent buffer, therefore with a count of -1, it will not cancel out the corresponding reference we have in the extent tree (with a count of 1), since both references end up in different rb trees. When using fiemap, where we often need to check if extents are shared through shared subtrees resulting from snapshots, it means we can incorrectly report an extent as shared when it's no longer shared. However this is temporary because after the transaction is committed the extent is no longer reported as shared, as running the delayed reference results in deleting the tree block reference from the extent tree. Outside the fiemap context, the result is unpredictable, as the key was not initialized but it's used when navigating the rb trees to insert and search for references (prelim_ref_compare()), and we expect all references in the indirect rb tree to have valid keys. The following reproducer triggers the second bug: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount -o compress $DEV $MNT # With a compressed 128M file we get a tree height of 2 (level 1 root). xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo btrfs subvolume snapshot $MNT $MNT/snap # Fiemap should output 0x2008 in the flags column. # 0x2000 means shared extent # 0x8 means encoded extent (because it's compressed) echo echo "fiemap after snapshot, range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo # Overwrite one extent and fsync to flush delalloc and COW a new path # in the snapshot's tree. # # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent # buffer in the path. # # In the extent tree we have inline references of type # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent # buffers, so they should cancel each other, and the extent buffers in # the fs tree should no longer be considered as shared. # echo "Overwriting file range [120M, 120M + 128K)..." xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo xfs_io -c "fsync" $MNT/snap/foo # Fiemap should output 0x8 in the flags column. The extent in the range # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs # tree. echo echo "fiemap after overwrite range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo umount $MNT Running it before this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 The extent in the range [120M, 120M + 128K) is still reported as shared (0x2000 bit set) after overwriting that range and flushing delalloc, which is not correct - an entire path was COWed in the snapshot's tree and the extent is now only referenced by the original fs tree. Running it after this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x8 Now the extent is not reported as shared anymore. So fix this by passing a NULL key pointer to add_indirect_ref() when processing a delayed reference for a tree block if there's no extent op for our delayed ref head with a defined key. Also access the extent op only after locking the delayed ref head's lock. The reproducer will be converted later to a test case for fstests. Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 3c0c1f626c75..41a5e650e901 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -820,16 +820,11 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, struct preftrees *preftrees, struct share_check *sc) { struct btrfs_delayed_ref_node *node; - struct btrfs_delayed_extent_op *extent_op = head->extent_op; struct btrfs_key key; - struct btrfs_key tmp_op_key; struct rb_node *n; int count; int ret = 0; - if (extent_op && extent_op->update_key) - btrfs_disk_key_to_cpu(&tmp_op_key, &extent_op->key); - spin_lock(&head->lock); for (n = rb_first_cached(&head->ref_tree); n; n = rb_next(n)) { node = rb_entry(n, struct btrfs_delayed_ref_node, @@ -855,10 +850,16 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, case BTRFS_TREE_BLOCK_REF_KEY: { /* NORMAL INDIRECT METADATA backref */ struct btrfs_delayed_tree_ref *ref; + struct btrfs_key *key_ptr = NULL; + + if (head->extent_op && head->extent_op->update_key) { + btrfs_disk_key_to_cpu(&key, &head->extent_op->key); + key_ptr = &key; + } ref = btrfs_delayed_node_to_tree_ref(node); ret = add_indirect_ref(fs_info, preftrees, ref->root, - &tmp_op_key, ref->level + 1, + key_ptr, ref->level + 1, node->bytenr, count, sc, GFP_ATOMIC); break; From patchwork Mon Oct 10 10:22:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81CD8C433F5 for ; Mon, 10 Oct 2022 10:22:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231809AbiJJKWj (ORCPT ); Mon, 10 Oct 2022 06:22:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231660AbiJJKWe (ORCPT ); Mon, 10 Oct 2022 06:22:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B2395A816 for ; Mon, 10 Oct 2022 03:22:26 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0575C60ECC for ; Mon, 10 Oct 2022 10:22:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E85CCC433B5 for ; Mon, 10 Oct 2022 10:22:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397345; bh=p3sy30wSeX3U4FetLvIOeJ55UoZ8vhRvWAubvcXN8pc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=O3rlgtxr+eY2CJMdFpntIuD3knJfb4jR0w/2BaAlmhDxRNO0Nugwx8htJW7Y66SUC 0blk60FZ7shf9kCMBNlblUlTP8du+UEmhUH5nqus+I2MGBwSZoo1DWcJiDotNyUUOM y2/W5BDXcvq5hi2KJmR9NXnO+Lt5KY3ZGERsoettLyxGMnHLlQMP0FwsXpM6rlQ71i yBwAmOPbV+O+d8TFnbzikWxuFltSFebSFSZjjtqNr6lVasJPL0f0Nchd841mLOMftN QyGFHeZgvXaV3FtQeXBGqYfRpQat6LovQA3bTIIe7xRY2vzbAtvJ79sQhjGXBzlJHo IlfRHc9Ls27aw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 02/18] btrfs: ignore fiemap path cache if we have multiple leaves for a data extent Date: Mon, 10 Oct 2022 11:22:04 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The path cache used during fiemap used to determine the sharedness of extent buffers in a path from a leaf containing a file extent item pointing to our data extent up to the root node of the tree, is meant to be used for a single path. Having a single path is by far the most common case, and therefore worth to optimize for, but it's possible to actually have multiple paths because we have 2 or more leaves. If we have multiple leaves, the 'level' variable keeps getting incremented in each iteration of the while loop at btrfs_is_data_extent_shared(), which means we will treat the second leaf in the 'tmp' ulist as a level 1 node, and so forth. In the worst case this can lead to getting a level greater than or equals to BTRFS_MAX_LEVEL (8), which will trigger a WARN_ON_ONCE() in the functions to lookup from or store in the path cache (lookup_backref_shared_cache() and store_backref_shared_cache()). If the current level never goes beyond 8, due to shared nodes in the paths and a fs tree height smaller than 8, it can still result in incorrectly marking one leaf as shared because some other leaf is shared and is stored one level below that other leaf, as when storing a true sharedness value in the cache results in updating the sharedness to true of all entries in the cache below the current level. Having multiple leaves happens in a case like the following: - We have a file extent item point to data extent at bytenr X, for a file range [0, 1M[ for example; - At this moment we have an extent data ref for the extent, with an offset of 0 and a count of 1; - A write into the middle of the extent happens, file range [64K, 128K) so the file extent item is split into two (at btrfs_drop_extents()): 1) One for file range [0, 64K), with a length (num_bytes field) of 64K and an extent offset of 0; 2) Another one for file range [128K, 1M), with a length of 896K (1M - 128K) and an extent offset of 128K. - At this moment the two file extent items are located in the same leaf; - A new file extent item for the range [64K, 128K), pointing to a new data extent, is inserted in the leaf. This results in a leaf split and now those two file extent items pointing to data extent X end up located in different leaves; - Once delayed refs are run, we still have a single extent data ref item for our data extent at bytenr X, for offset 0, but now with a count of 2 instead of 1; - So during fiemap, at btrfs_is_data_extent_shared(), after we call find_parent_nodes() for the data extent, we get two leaves, since we have two file extent items point to data extent at bytenr X that are located in two different leaves. So skip the use of the path cache when we get more than one leaf. Fixes: 12a824dc67a61e ("btrfs: speedup checking for extent sharedness during fiemap") Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 25 +++++++++++++++++++++++++ fs/btrfs/backref.h | 1 + 2 files changed, 26 insertions(+) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 41a5e650e901..ac07a6587719 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1523,6 +1523,9 @@ static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache { struct btrfs_backref_shared_cache_entry *entry; + if (!cache->use_cache) + return false; + if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) return false; @@ -1587,6 +1590,9 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, struct btrfs_backref_shared_cache_entry *entry; u64 gen; + if (!cache->use_cache) + return; + if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) return; @@ -1683,6 +1689,7 @@ int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr, /* -1 means we are in the bytenr of the data extent. */ level = -1; ULIST_ITER_INIT(&uiter); + cache->use_cache = true; while (1) { bool is_shared; bool cached; @@ -1712,6 +1719,24 @@ int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr, extent_gen > btrfs_root_last_snapshot(&root->root_item)) break; + /* + * If our data extent was not directly shared (without multiple + * reference items), than it might have a single reference item + * with a count > 1 for the same offset, which means there are 2 + * (or more) file extent items that point to the data extent - + * this happens when a file extent item needs to be split and + * then one item gets moved to another leaf due to a b+tree leaf + * split when inserting some item. In this case the file extent + * items may be located in different leaves and therefore some + * of the leaves may be referenced through shared subtrees while + * others are not. Since our extent buffer cache only works for + * a single path (by far the most common case and simpler to + * deal with), we can not use it if we have multiple leaves + * (which implies multiple paths). + */ + if (level == -1 && tmp->nnodes > 1) + cache->use_cache = false; + if (level >= 0) store_backref_shared_cache(cache, root, bytenr, level, false); diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 52ae6957b414..8e69584d538d 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -29,6 +29,7 @@ struct btrfs_backref_shared_cache { * a given data extent should never exceed the maximum b+tree height. */ struct btrfs_backref_shared_cache_entry entries[BTRFS_MAX_LEVEL]; + bool use_cache; }; typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root, From patchwork Mon Oct 10 10:22:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA0E0C433FE for ; Mon, 10 Oct 2022 10:22:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231555AbiJJKWl (ORCPT ); Mon, 10 Oct 2022 06:22:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231656AbiJJKWe (ORCPT ); Mon, 10 Oct 2022 06:22:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F8DA6B159 for ; Mon, 10 Oct 2022 03:22:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D76FC60ECE for ; Mon, 10 Oct 2022 10:22:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE4C8C433C1 for ; Mon, 10 Oct 2022 10:22:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397346; bh=XEHhBU03Mz+hEp5lxPdN8/D4vUQeaLBenLlnWZU9ya4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=orIfnMFPBGNv0ART9CKmJz+qxLLKk7bPysh7hYceqfNXnCpBTnfbiw4EJ1oOMuRP6 k8An3j15mxNtsQ2w3ft7s0gmnPsJVgN+isgLFwitRUWlErKIAiaG0JXAfEWX3kV2KU B4qvyMDpWw9sk69x8txthY7tvd9SbPU0ELvJWK0Qp9ZFoOYeBrBPRL5mBMlyTqwgmi gh/bZiGSz1zENXH294GRPsTRkSDSTHIP+oWrTebgqrZseObEmwyIyWXHZQflVyl+QV 4LoMf3FuyGrKr1/2AIzZWWDLbGMfpAgTBfdTsw/1s/UP5BPm7S0buBeH9wKntcraIl JWUTN8K2TgnTw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 03/18] btrfs: get the next extent map during fiemap/lseek more efficiently Date: Mon, 10 Oct 2022 11:22:05 +0100 Message-Id: <2ec5bdb13c1bb6a2d1fb883000f695037cb0f3ac.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At find_delalloc_subrange(), when we need to get the next extent map, we do a full search on the extent map tree (a red black tree). This is fine but it's a lot more efficient to simply use rb_next(), which typically requires iterating over less nodes of the tree and never needs to compare the ranges of nodes with the one we are looking for. So add a public helper to extent_map.{h,c} to get the extent map that immediately follows another extent map, using rb_next(), and use that helper at find_delalloc_subrange(). Signed-off-by: Filipe Manana --- fs/btrfs/extent_map.c | 31 +++++++++++++++++++++++++++++- fs/btrfs/extent_map.h | 2 ++ fs/btrfs/file.c | 44 ++++++++++++++++++++++++++----------------- 3 files changed, 59 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 6092a4eedc92..715979807ae1 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -523,7 +523,7 @@ void replace_extent_mapping(struct extent_map_tree *tree, setup_extent_mapping(tree, new, modified); } -static struct extent_map *next_extent_map(struct extent_map *em) +static struct extent_map *next_extent_map(const struct extent_map *em) { struct rb_node *next; @@ -533,6 +533,35 @@ static struct extent_map *next_extent_map(struct extent_map *em) return container_of(next, struct extent_map, rb_node); } +/* + * Get the extent map that immediately follows another one. + * + * @tree: The extent map tree that the extent map belong to. + * Holding read or write access on the tree's lock is required. + * @em: An extent map from the given tree. The caller must ensure that + * between getting @em and between calling this function, the + * extent map @em is not removed from the tree - for example, by + * holding the tree's lock for the duration of those 2 operations. + * + * Returns the extent map that immediately follows @em, or NULL if @em is the + * last extent map in the tree. + */ +struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree, + const struct extent_map *em) +{ + struct extent_map *next; + + /* The lock must be acquired either in read mode or write mode. */ + lockdep_assert_held(&tree->lock); + ASSERT(extent_map_in_tree(em)); + + next = next_extent_map(em); + if (next) + refcount_inc(&next->refs); + + return next; +} + static struct extent_map *prev_extent_map(struct extent_map *em) { struct rb_node *prev; diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index ad311864272a..68d3f2c9ea1d 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -87,6 +87,8 @@ static inline u64 extent_map_block_end(struct extent_map *em) void extent_map_tree_init(struct extent_map_tree *tree); struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree, u64 start, u64 len); +struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree, + const struct extent_map *em); int add_extent_mapping(struct extent_map_tree *tree, struct extent_map *em, int modified); void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 176b432035ae..4a9a2e660b42 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3550,40 +3550,50 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end */ read_lock(&em_tree->lock); em = lookup_extent_mapping(em_tree, start, len); - read_unlock(&em_tree->lock); + if (!em) { + read_unlock(&em_tree->lock); + return (delalloc_len > 0); + } /* extent_map_end() returns a non-inclusive end offset. */ - em_end = em ? extent_map_end(em) : 0; + em_end = extent_map_end(em); /* * If we have a hole/prealloc extent map, check the next one if this one * ends before our range's end. */ - if (em && (em->block_start == EXTENT_MAP_HOLE || - test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) { + if ((em->block_start == EXTENT_MAP_HOLE || + test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) { struct extent_map *next_em; - read_lock(&em_tree->lock); - next_em = lookup_extent_mapping(em_tree, em_end, len - em_end); - read_unlock(&em_tree->lock); - + next_em = btrfs_next_extent_map(em_tree, em); free_extent_map(em); - em_end = next_em ? extent_map_end(next_em) : 0; + + /* + * There's no next extent map or the next one starts beyond our + * range, return the range found in the io tree (if any). + */ + if (!next_em || next_em->start > end) { + read_unlock(&em_tree->lock); + free_extent_map(next_em); + return (delalloc_len > 0); + } + + em_end = extent_map_end(next_em); em = next_em; } - if (em && (em->block_start == EXTENT_MAP_HOLE || - test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) { - free_extent_map(em); - em = NULL; - } + read_unlock(&em_tree->lock); /* - * No extent map or one for a hole or prealloc extent. Use the delalloc - * range we found in the io tree if we have one. + * We have a hole or prealloc extent that ends at or beyond our range's + * end, return the range found in the io tree (if any). */ - if (!em) + if (em->block_start == EXTENT_MAP_HOLE || + test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) { + free_extent_map(em); return (delalloc_len > 0); + } /* * We don't have any range as EXTENT_DELALLOC in the io tree, so the From patchwork Mon Oct 10 10:22:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C4FC4332F for ; Mon, 10 Oct 2022 10:22:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231779AbiJJKWi (ORCPT ); Mon, 10 Oct 2022 06:22:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231671AbiJJKWe (ORCPT ); Mon, 10 Oct 2022 06:22:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD0CA6B15D for ; Mon, 10 Oct 2022 03:22:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BFF4860ED7 for ; Mon, 10 Oct 2022 10:22:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2E43C433D7 for ; Mon, 10 Oct 2022 10:22:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397347; bh=vcWQDX1BRTC9RFf4yU2jDuyPIPNMF0Kc/dx+1tTHMpQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=e8qtv4ZJ4RpIJ4gjar2Qo95IPMhcRt4lD/IwF7d95O41uzX5ClDITMj7gcBXXFfH7 iIMGqYbX9UAxbAipV3gnicecc1GpiydMCON3MDIDvxCDy7CRH4PsDjU6c8ZIF7jD5g prVdUV7iTT18aB7nZYEy8OtFXLoDQiFu6/3eCM8003A07uvlIXS4GyUYPeP2aw6gyS QHoMHyQc23yK2On/HwTJWkk0f1YDvgnBiB7Rctkp5hY/dToLcyqQZ2S/L985qkpdg4 +3tjB0/Q+OCI7/JVE31VLTtBVKg+5cLCAKV3i6BtNlXFT7FrvEpOGTq7/8Ya/+OYIe 689qeGojfG+Ow== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 04/18] btrfs: skip unnecessary extent map searches during fiemap and lseek Date: Mon, 10 Oct 2022 11:22:06 +0100 Message-Id: <767d0afc7bd89abcd067112b1f371ce650dc3059.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana If we have no outstanding extents it means we don't have any extent maps corresponding to delalloc that is flushing, as when an ordered extent is created we increment the number of outstanding extents to 1 and when we remove the ordered extent we decrement them by 1. So skip extent map tree searches if the number of outstanding ordered extents is 0, saving time as the tree is not empty if we have previously made some reads or flushed delalloc, as in those cases it can have a very large number of extent maps for files with many extents. This helps save time when processing a file range corresponding to a hole or prealloc (unwritten) extent. The next patch in the series has a performance test in its changelog and its subject is: "btrfs: skip unnecessary delalloc search during fiemap and lseek" Signed-off-by: Filipe Manana --- fs/btrfs/file.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4a9a2e660b42..36618ddadc5f 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3534,6 +3534,18 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end if (delalloc_len > 0) *delalloc_end_ret = *delalloc_start_ret + delalloc_len - 1; + spin_lock(&inode->lock); + if (inode->outstanding_extents == 0) { + /* + * No outstanding extents means we don't have any delalloc that + * is flushing, so return the unflushed range found in the io + * tree (if any). + */ + spin_unlock(&inode->lock); + return (delalloc_len > 0); + } + spin_unlock(&inode->lock); + /* * Now also check if there's any extent map in the range that does not * map to a hole or prealloc extent. We do this because: From patchwork Mon Oct 10 10:22:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002442 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01B4EC433F5 for ; Mon, 10 Oct 2022 10:22:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231896AbiJJKWq (ORCPT ); Mon, 10 Oct 2022 06:22:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231753AbiJJKWh (ORCPT ); Mon, 10 Oct 2022 06:22:37 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F8A76B153 for ; Mon, 10 Oct 2022 03:22:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 50B90B80E0F for ; Mon, 10 Oct 2022 10:22:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9841DC4314D for ; Mon, 10 Oct 2022 10:22:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397348; bh=obDzy7qbrC5qdorPpxZ4XfK5vK/3XuFfiX+KdPQlbgs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=JCacQ3sk8PlRIDFx5fy4OqK1T12TPz+I99r2lFInwTfyp52XWHtN+PVlI9TS01hK0 F97b0nwLXKhTpDGHFqu6SgTlofmlQrsQMsyPNvxsBCKs1OpAzvSus9Ux0Fwb0MOath 4udu97B5Iekl8xuDL823/QiIc/gDQOu1uMibbBu9iUd7558tXdMgA7aiVTP/Jah4xx VhUaIiHPnHPHiauEmna9l7S6xRCwcT5Uvn1/A2p2NW7QLT6QcbH6ZkRCbHMJuqr2kB IbA9am20MXJXO1X/8uueu7M0Gw6BCxSiAWmt2xpxzx3kmm22KWqVyxjZZJv/UW6Eyt n7oqGxBfvgTQg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/18] btrfs: skip unnecessary delalloc search during fiemap and lseek Date: Mon, 10 Oct 2022 11:22:07 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During fiemap and lseek (hole and data seeking), there's no point in iterating the inode's io tree to count delalloc bits if the inode's delalloc bytes counter has a value of zero, as that counter is updated whenever we set a range for delalloc or clear a range from delalloc. So skip the counting and io tree iteration if the inode's delalloc bytes counter has a value of zero. This helps save time when processing a file range corresponding to a hole or prealloc (unwritten) extent. This patch is part of a series comprised of the following patches: btrfs: get the next extent map during fiemap/lseek more efficiently btrfs: skip unnecessary extent map searches during fiemap and lseek btrfs: skip unnecessary delalloc search during fiemap and lseek The following test was performed on a release kernel (Debian's default kernel config) before and after applying those 3 patches. # Wrapper to call fiemap in extent count only mode. # (struct fiemap::fm_extent_count set to 0) $ cat fiemap.c #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { struct fiemap fiemap = { 0 }; int fd; if (argc != 2) { printf("usage: %s \n", argv[0]); return 1; } fd = open(argv[1], O_RDONLY); if (fd < 0) { fprintf(stderr, "error opening file: %s\n", strerror(errno)); return 1; } /* fiemap.fm_extent_count set to 0, to count extents only. */ fiemap.fm_length = FIEMAP_MAX_OFFSET; if (ioctl(fd, FS_IOC_FIEMAP, &fiemap) < 0) { fprintf(stderr, "fiemap error: %s\n", strerror(errno)); return 1; } close(fd); printf("fm_mapped_extents = %d\n", fiemap.fm_mapped_extents); return 0; } $ gcc -o fiemap fiemap.c And the wrapper shell script that creates a file with many holes and runs fiemap against it: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT FILE_SIZE=$((1 * 1024 * 1024 * 1024)) echo -n > $MNT/foobar for ((off = 0; off < $FILE_SIZE; off += 8192)); do xfs_io -c "pwrite -S 0xab $off 4K" $MNT/foobar > /dev/null done # flush all delalloc sync start=$(date +%s%N) ./fiemap $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds" umount $MNT Result before applying patchset: fm_mapped_extents = 131072 fiemap took 63 milliseconds Result after applying patchset: fm_mapped_extents = 131072 fiemap took 39 milliseconds (-38.1%) Running the same test for a 512M file instead of a 1G file, gave the following results. Result before applying patchset: fm_mapped_extents = 65536 fiemap took 29 milliseconds Result after applying patchset: fm_mapped_extents = 65536 fiemap took 20 milliseconds (-31.0%) Signed-off-by: Filipe Manana --- fs/btrfs/file.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 36618ddadc5f..0d0c662007ab 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3518,15 +3518,27 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end struct extent_map *em; u64 em_end; u64 delalloc_len; + unsigned int outstanding_extents; /* * Search the io tree first for EXTENT_DELALLOC. If we find any, it * means we have delalloc (dirty pages) for which writeback has not * started yet. */ - *delalloc_start_ret = start; - delalloc_len = count_range_bits(&inode->io_tree, delalloc_start_ret, end, - len, EXTENT_DELALLOC, 1); + spin_lock(&inode->lock); + outstanding_extents = inode->outstanding_extents; + + if (inode->delalloc_bytes > 0) { + spin_unlock(&inode->lock); + *delalloc_start_ret = start; + delalloc_len = count_range_bits(&inode->io_tree, + delalloc_start_ret, end, + len, EXTENT_DELALLOC, 1); + } else { + spin_unlock(&inode->lock); + delalloc_len = 0; + } + /* * If delalloc was found then *delalloc_start_ret has a sector size * aligned value (rounded down). @@ -3534,17 +3546,12 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end if (delalloc_len > 0) *delalloc_end_ret = *delalloc_start_ret + delalloc_len - 1; - spin_lock(&inode->lock); - if (inode->outstanding_extents == 0) { - /* - * No outstanding extents means we don't have any delalloc that - * is flushing, so return the unflushed range found in the io - * tree (if any). - */ - spin_unlock(&inode->lock); + /* + * No outstanding extents means we don't have any delalloc that is + * flushing, so return the unflushed range found in the io tree (if any). + */ + if (outstanding_extents == 0) return (delalloc_len > 0); - } - spin_unlock(&inode->lock); /* * Now also check if there's any extent map in the range that does not From patchwork Mon Oct 10 10:22:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE6D9C433F5 for ; Mon, 10 Oct 2022 10:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231863AbiJJKWm (ORCPT ); Mon, 10 Oct 2022 06:22:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231398AbiJJKWg (ORCPT ); Mon, 10 Oct 2022 06:22:36 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 880CE6B652 for ; Mon, 10 Oct 2022 03:22:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 87CF960EE7 for ; Mon, 10 Oct 2022 10:22:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E673C433C1 for ; Mon, 10 Oct 2022 10:22:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397349; bh=KZB5HiCiAzQDv4MD+fbkXTqvjvIwuYOQUoXd8iAgIfA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=djWfCHdIPbvghQDopPb9C1uL4fPJ8hUo82xEkryfUZuRveKV4btYD9O9aQa6YQEAL wpbG4wqYuFTRLhFfOJDdM3QWQhen2fHUcvw/b7b06y0zV0WU0MeqoAOT6N/n3zU4tD tP2TXAGc4q70ToKtf+Tf6ABolHyUAzMAJIW+oMvC2LwhYIY+BNXTarX8Z4a4DJyKNq h8g9fVn4WCGa52nEWBOBTPBa2uvpn4YC6WdApUDS+kxWmO1EeXHJ16fHK3tuwaXMNb B79DnlMLfsJreFUrlOpErMaOwMjpMwKih0nD3L7BWgAiOlPRcuuePDAQlcd+3UNTHG n9KT3Z5MArAQA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 06/18] btrfs: drop pointless memset when cloning extent buffer Date: Mon, 10 Oct 2022 11:22:08 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At btrfs_clone_extent_buffer(), before allocating the pages array for the new extent buffer we are calling memset() to zero out the pages array of the extent buffer. This is pointless however, because the extent buffer already has every element in its pages array pointing to NULL, as it was allocated with kmem_cache_zalloc(). The memset() was introduced with commit dd137dd1f2d719 ("btrfs: factor out allocating an array of pages"), but even before that commit we already depended on the pages array being initialized to NULL for the error paths that need to call btrfs_release_extent_buffer(). So remove the memset(), it's useless and slightly increases the object text size. Before this change: $ size fs/btrfs/extent_io.o text data bss dec hex filename 70580 5469 40 76089 12939 fs/btrfs/extent_io.o After this change: $ size fs/btrfs/extent_io.o text data bss dec hex filename 70564 5469 40 76073 12929 fs/btrfs/extent_io.o Signed-off-by: Filipe Manana --- fs/btrfs/extent_io.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1eae68fbae21..c9a9f784d21c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4302,7 +4302,6 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src) */ set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags); - memset(new->pages, 0, sizeof(*new->pages) * num_pages); ret = btrfs_alloc_page_array(num_pages, new->pages); if (ret) { btrfs_release_extent_buffer(new); From patchwork Mon Oct 10 10:22:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002444 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC514C433F5 for ; Mon, 10 Oct 2022 10:22:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231629AbiJJKWu (ORCPT ); Mon, 10 Oct 2022 06:22:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231835AbiJJKWj (ORCPT ); Mon, 10 Oct 2022 06:22:39 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E0B06B8CA for ; Mon, 10 Oct 2022 03:22:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 238C3B80D99 for ; Mon, 10 Oct 2022 10:22:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 63AE9C433D6 for ; Mon, 10 Oct 2022 10:22:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397349; bh=Ydd7XrNtQanncJc+agHB/7DLxRHL6YAvBmgB+GRihYM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=mHcU3bot6rS6RRjnDz4fF0PoP99jK5fm6CUzp51eJ5kkvzwFph8MxCwEmt1nj7Peo wTMGaNWNKmo+Ixr9LWbVr8cChHq+kGgCCZYahEPyLM8FJSAqWBHCPuW7PsUR3uhipc 0S1vu+VDTgTrjl6bF5vIRilWdHe0PXH85je8WQRqCYmO8pSMOgCgINzSHof59m2Sel 6ESloV/DRyqXpjXJ6Ju1R3kv4wZXd1idIHcD90fjI9ZkLK1MiA3XgSlnFxWNn7CbhC aH+yU6S3Vaq+2t9SyMArdkDpi3ZbqJ1tYK+ql1e/I97chB+sl+bnoKr1spZGlp/pgm Fc3g60Su441uA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 07/18] btrfs: drop redundant bflags initialization when allocating extent buffer Date: Mon, 10 Oct 2022 11:22:09 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When allocating an extent buffer, at __alloc_extent_buffer(), there's no point in explicitly assigning zero to the bflags field of the new extent buffer because we allocated it with kmem_cache_zalloc(). So just remove the redundant initialization, it saves one mov instruction in the generated assembly code for x86_64 ("movq $0x0,0x10(%rax)"). Signed-off-by: Filipe Manana --- fs/btrfs/extent_io.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c9a9f784d21c..ca67f041e43e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4269,7 +4269,6 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, eb->start = start; eb->len = len; eb->fs_info = fs_info; - eb->bflags = 0; init_rwsem(&eb->lock); btrfs_leak_debug_add_eb(eb); From patchwork Mon Oct 10 10:22:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C658C4332F for ; Mon, 10 Oct 2022 10:22:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231656AbiJJKWy (ORCPT ); Mon, 10 Oct 2022 06:22:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231853AbiJJKWl (ORCPT ); Mon, 10 Oct 2022 06:22:41 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EE8D52E76 for ; Mon, 10 Oct 2022 03:22:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0B7E6B80E71 for ; Mon, 10 Oct 2022 10:22:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48B42C433D7 for ; Mon, 10 Oct 2022 10:22:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397350; bh=uAlLAlKffoQVDYeuoy8zlNBzKih62TNzbLGG4AD3PO4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=q3WSnPcvv7bseP3hQ/CpE8x+qNF57FR8zpQeO141fzuR4MuMkBTfxvv9nn/+OXcG/ JMY5ZPPyIhN2lzu1+Fi9kIApSBwfBEmKZyOIbog5vre1Dh8MsGRlIT3UXxEdJaJ1Ow QQIPgCaDxTS6o3WIH1WBh+n9Ah+X7rrObT1v4riMnLO63VIOZD9539S8dt6mrGfUwH CsOIG/cqOW9oMdjX7Liqt2aNzFG4eGkVWmIXug3XPrqmei2qGSAf9KAcNhn9XxRIHL 3mIX4vEOAHmpd7YrAAIUFNtjJlSHIWI2pbqeWdhgNd7mfP6HhwGc0rPDunD5bMuxYc 1HLMlbac0Jpug== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/18] btrfs: remove checks for a root with id 0 during backref walking Date: Mon, 10 Oct 2022 11:22:10 +0100 Message-Id: <14b6aa002d9f3bbf4247818a0c5136a70c1102e1.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When doing backref walking to determine if an extent is shared, we are testing the root_objectid of the given share_check struct is 0, but that is an impossible case, since btrfs_is_data_extent_shared() always initializes the root_objectid field with the id of the given root, and no root can have an objectid of 0. So remove those checks. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index ac07a6587719..c112c99af1a1 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -704,8 +704,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, continue; } - if (sc && sc->root_objectid && - ref->root_id != sc->root_objectid) { + if (sc && ref->root_id != sc->root_objectid) { free_pref(ref); ret = BACKREF_FOUND_SHARED; goto out; @@ -1317,8 +1316,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * and would retain their original ref->count < 0. */ if (roots && ref->count && ref->root_id && ref->parent == 0) { - if (sc && sc->root_objectid && - ref->root_id != sc->root_objectid) { + if (sc && ref->root_id != sc->root_objectid) { ret = BACKREF_FOUND_SHARED; goto out; } From patchwork Mon Oct 10 10:22:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD837C433F5 for ; Mon, 10 Oct 2022 10:22:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231631AbiJJKWt (ORCPT ); Mon, 10 Oct 2022 06:22:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231806AbiJJKWj (ORCPT ); Mon, 10 Oct 2022 06:22:39 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DB2552810 for ; Mon, 10 Oct 2022 03:22:32 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3DD9760EB1 for ; Mon, 10 Oct 2022 10:22:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D484C43142 for ; Mon, 10 Oct 2022 10:22:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397351; bh=KLOatonutJdhc2zQ41HbfT1Fc5B9bIxn0dHI21Nma6w=; h=From:To:Subject:Date:In-Reply-To:References:From; b=fZX6+AN7bG/3kf08O3B3OHz5kk/LKOW1IoHkHIuFqXRGPTNYA8qhLzpjZEGfVBXtV NuUTes2gFkOcE43ZnvYATLW49Aut6jGbRNgCW4rcxmHXgXCKJL3SY3ErSjXxuEAGuT v8WdGL0TFbEMNJJiiPh5nAS9th9cxyCCEj1Ipz9DkCWS6Pw+ONc2CmGNE5cvwM6+/F 1oFFIa4RkNOnKyHTUf4Siqoor/r/5FAQy8qeFdveLT2wZUSoneKtgriC7hPdUayOXM 7YivwIS32sszbjnw9FWSicy+EfLdEbehjlvB/OjbrGUdaIylvFKiIDDZkPH/Zf0trU k9iVWX84mk0jA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/18] btrfs: remove checks for a 0 inode number during backref walking Date: Mon, 10 Oct 2022 11:22:11 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When doing backref walking to determine if an extent is shared, we are testing if the inode number, stored in the 'inum' field of struct share_check, is 0. However that can never be case, since the all instances of the structure are created at btrfs_is_data_extent_shared(), which always initializes it with the inode number from a fs tree (and the number for any inode from any tree can never be 0). So remove the checks. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index c112c99af1a1..bc3345dae381 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -887,7 +887,7 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, * Found a inum that doesn't match our known inum, we * know it's shared. */ - if (sc && sc->inum && ref->objectid != sc->inum) { + if (sc && ref->objectid != sc->inum) { ret = BACKREF_FOUND_SHARED; goto out; } @@ -1023,7 +1023,7 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, key.type = BTRFS_EXTENT_DATA_KEY; key.offset = btrfs_extent_data_ref_offset(leaf, dref); - if (sc && sc->inum && key.objectid != sc->inum) { + if (sc && key.objectid != sc->inum) { ret = BACKREF_FOUND_SHARED; break; } @@ -1122,7 +1122,7 @@ static int add_keyed_refs(struct btrfs_root *extent_root, key.type = BTRFS_EXTENT_DATA_KEY; key.offset = btrfs_extent_data_ref_offset(leaf, dref); - if (sc && sc->inum && key.objectid != sc->inum) { + if (sc && key.objectid != sc->inum) { ret = BACKREF_FOUND_SHARED; break; } From patchwork Mon Oct 10 10:22:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA8F8C433FE for ; Mon, 10 Oct 2022 10:22:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231443AbiJJKWw (ORCPT ); Mon, 10 Oct 2022 06:22:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231697AbiJJKWk (ORCPT ); Mon, 10 Oct 2022 06:22:40 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D50236B8D2 for ; Mon, 10 Oct 2022 03:22:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1DCC760EC9 for ; Mon, 10 Oct 2022 10:22:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1223DC433D6 for ; Mon, 10 Oct 2022 10:22:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397352; bh=vjRmnwCztk2q7u258H0PxMtfwEjOM3poX9UPco3p+Hg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=RlmglKs6A/CAsTcfOAURujZ8EVoSbd1WueFnOd864gYLJv5nW+hoeOh4MdcklVMNc AFf6EYQ70tBzXt0fhNLx3C0vSx5tJG7+EYqTJBfKcElUMEua+2REDiu4fvbkXBFFNv h6hyxsXWQ8neQHEf46w0rvdsk3XZJl8sqy4b2gr33D1LrsKU7i0bscVSZi53z4rpGY XR6C5mzeGvQo98ZVpdyIUYUkxqIBTpPKlWQLpKAkLhdgBH4JbtuMvte4ZF3cf55LvO 2EN2xc40r030Do0gP/dy09H3/R7Td5y3IS3CkmBGDzUXZrZPgWWBYuw1ml5WC9FBrb 3iVGrQ2c7xQ4A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/18] btrfs: directly pass the inode to btrfs_is_data_extent_shared() Date: Mon, 10 Oct 2022 11:22:12 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently we pass a root and an inode number as arguments for btrfs_is_data_extent_shared() and the inode number is always from an inode that belongs to that root (it wouldn't make sense otherwise). In every context that we call btrfs_is_data_extent_shared() (fiemap only), we have an inode available, so directly pass the inode to the function instead of a root and inode number. This reduces the number of parameters and it makes the function's signature conform to most other functions we have. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 8 ++++---- fs/btrfs/backref.h | 2 +- fs/btrfs/extent_io.c | 16 +++++++--------- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index bc3345dae381..8effe318cd0c 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1631,8 +1631,7 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, /* * Check if a data extent is shared or not. * - * @root: The root the inode belongs to. - * @inum: Number of the inode whose extent we are checking. + * @inode: The inode whose extent we are checking. * @bytenr: Logical bytenr of the extent we are checking. * @extent_gen: Generation of the extent (file extent item) or 0 if it is * not known. @@ -1651,11 +1650,12 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, * * Return: 0 if extent is not shared, 1 if it is shared, < 0 on error. */ -int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr, +int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, struct ulist *roots, struct ulist *tmp, struct btrfs_backref_shared_cache *cache) { + struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_trans_handle *trans; struct ulist_iterator uiter; @@ -1664,7 +1664,7 @@ int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr, int ret = 0; struct share_check shared = { .root_objectid = root->root_key.objectid, - .inum = inum, + .inum = btrfs_ino(inode), .share_count = 0, }; int level; diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 8e69584d538d..c846fa2bdf93 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -77,7 +77,7 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid, u64 start_off, struct btrfs_path *path, struct btrfs_inode_extref **ret_extref, u64 *found_off); -int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr, +int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, struct ulist *roots, struct ulist *tmp, struct btrfs_backref_shared_cache *cache); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ca67f041e43e..01feeb215bd9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3715,7 +3715,6 @@ static int fiemap_process_hole(struct btrfs_inode *inode, u64 start, u64 end) { const u64 i_size = i_size_read(&inode->vfs_inode); - const u64 ino = btrfs_ino(inode); u64 cur_offset = start; u64 last_delalloc_end = 0; u32 prealloc_flags = FIEMAP_EXTENT_UNWRITTEN; @@ -3755,8 +3754,8 @@ static int fiemap_process_hole(struct btrfs_inode *inode, if (prealloc_len > 0) { if (!checked_extent_shared && fieinfo->fi_extents_max) { - ret = btrfs_is_data_extent_shared(inode->root, - ino, disk_bytenr, + ret = btrfs_is_data_extent_shared(inode, + disk_bytenr, extent_gen, roots, tmp_ulist, backref_cache); @@ -3805,8 +3804,8 @@ static int fiemap_process_hole(struct btrfs_inode *inode, } if (!checked_extent_shared && fieinfo->fi_extents_max) { - ret = btrfs_is_data_extent_shared(inode->root, - ino, disk_bytenr, + ret = btrfs_is_data_extent_shared(inode, + disk_bytenr, extent_gen, roots, tmp_ulist, backref_cache); @@ -3907,7 +3906,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, const u64 ino = btrfs_ino(inode); struct extent_state *cached_state = NULL; struct btrfs_path *path; - struct btrfs_root *root = inode->root; struct fiemap_cache cache = { 0 }; struct btrfs_backref_shared_cache *backref_cache; struct ulist *roots; @@ -3928,8 +3926,8 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto out; } - lockstart = round_down(start, root->fs_info->sectorsize); - lockend = round_up(start + len, root->fs_info->sectorsize); + lockstart = round_down(start, inode->root->fs_info->sectorsize); + lockend = round_up(start + len, inode->root->fs_info->sectorsize); prev_extent_end = lockstart; lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); @@ -4037,7 +4035,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } else { /* We have a regular extent. */ if (fieinfo->fi_extents_max) { - ret = btrfs_is_data_extent_shared(root, ino, + ret = btrfs_is_data_extent_shared(inode, disk_bytenr, extent_gen, roots, From patchwork Mon Oct 10 10:22:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D50EAC433FE for ; Mon, 10 Oct 2022 10:22:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231504AbiJJKW4 (ORCPT ); Mon, 10 Oct 2022 06:22:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231703AbiJJKWl (ORCPT ); Mon, 10 Oct 2022 06:22:41 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AC596B15B for ; Mon, 10 Oct 2022 03:22:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 08D0C60ECC for ; Mon, 10 Oct 2022 10:22:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB0F4C433D7 for ; Mon, 10 Oct 2022 10:22:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397353; bh=Z5PLXP45KImDpmkd8NILFVagq5MWja+59B03vNdEuSk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eNKyNkjIo+pdgPIm8UDzP4GLEu7W7y2bAbn9B9mQpJqJpz9kjD7E5uY/GT/MW+0t2 NQUtNl8yhZ9PVjgclXisr7YXNprMnSh15Bw2ltEqMXF66phgr1+MLXNk7VT/X3Ytoz Kscsm6jO72d292Z5Bg4Jcsyn190+X3Y0DnDs0Y6d/PiPLTGjoxPfti56ZQRrwCtNAi oCdVkfURbLgHmEPqfyerF6QCfD8TsecMKK3P7p6/E1Yx2wGEESgwbogPhRpWtMA0Qe k/U64GqarsghB+OEaSZUctCBUEsAYgl/WaeQVgpeH79dixDrAUFhOq3lFmmOmrEBtr zHChQ7jfzFmdg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 11/18] btrfs: turn the backref sharedness check cache into a context object Date: Mon, 10 Oct 2022 11:22:13 +0100 Message-Id: <0189681fe561ba8502d7c556bdbb621482b5f1a9.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Right now we are using a struct btrfs_backref_shared_cache to pass state across multiple btrfs_is_data_extent_shared() calls. The structure's name closely follows its current purpose, which is to cache previous checks for the sharedness of metadata extents. However we will start using the structure for more things other than caching sharedness checks, so rename it to struct btrfs_backref_share_check_ctx. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 32 ++++++++++++++++---------------- fs/btrfs/backref.h | 8 ++++---- fs/btrfs/extent_io.c | 24 ++++++++++++------------ 3 files changed, 32 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 8effe318cd0c..36a8bc2971b1 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1515,13 +1515,13 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans, * fs_info->commit_root_sem semaphore, so no need to worry about the root's last * snapshot field changing while updating or checking the cache. */ -static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache, +static bool lookup_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, struct btrfs_root *root, u64 bytenr, int level, bool *is_shared) { struct btrfs_backref_shared_cache_entry *entry; - if (!cache->use_cache) + if (!ctx->use_path_cache) return false; if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) @@ -1535,7 +1535,7 @@ static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache */ ASSERT(level >= 0); - entry = &cache->entries[level]; + entry = &ctx->path_cache_entries[level]; /* Unused cache entry or being used for some other extent buffer. */ if (entry->bytenr != bytenr) @@ -1568,8 +1568,8 @@ static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache */ if (*is_shared) { for (int i = 0; i < level; i++) { - cache->entries[i].is_shared = true; - cache->entries[i].gen = entry->gen; + ctx->path_cache_entries[i].is_shared = true; + ctx->path_cache_entries[i].gen = entry->gen; } } @@ -1581,14 +1581,14 @@ static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache * fs_info->commit_root_sem semaphore, so no need to worry about the root's last * snapshot field changing while updating or checking the cache. */ -static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, +static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, struct btrfs_root *root, u64 bytenr, int level, bool is_shared) { struct btrfs_backref_shared_cache_entry *entry; u64 gen; - if (!cache->use_cache) + if (!ctx->use_path_cache) return; if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) @@ -1607,7 +1607,7 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, else gen = btrfs_root_last_snapshot(&root->root_item); - entry = &cache->entries[level]; + entry = &ctx->path_cache_entries[level]; entry->bytenr = bytenr; entry->is_shared = is_shared; entry->gen = gen; @@ -1621,7 +1621,7 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, */ if (is_shared) { for (int i = 0; i < level; i++) { - entry = &cache->entries[i]; + entry = &ctx->path_cache_entries[i]; entry->is_shared = is_shared; entry->gen = gen; } @@ -1637,7 +1637,7 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, * not known. * @roots: List of roots this extent is shared among. * @tmp: Temporary list used for iteration. - * @cache: A backref lookup result cache. + * @ctx: A backref sharedness check context. * * btrfs_is_data_extent_shared uses the backref walking code but will short * circuit as soon as it finds a root or inode that doesn't match the @@ -1653,7 +1653,7 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache, int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, struct ulist *roots, struct ulist *tmp, - struct btrfs_backref_shared_cache *cache) + struct btrfs_backref_share_check_ctx *ctx) { struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; @@ -1687,7 +1687,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, /* -1 means we are in the bytenr of the data extent. */ level = -1; ULIST_ITER_INIT(&uiter); - cache->use_cache = true; + ctx->use_path_cache = true; while (1) { bool is_shared; bool cached; @@ -1698,7 +1698,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, /* this is the only condition under which we return 1 */ ret = 1; if (level >= 0) - store_backref_shared_cache(cache, root, bytenr, + store_backref_shared_cache(ctx, root, bytenr, level, true); break; } @@ -1733,17 +1733,17 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, * (which implies multiple paths). */ if (level == -1 && tmp->nnodes > 1) - cache->use_cache = false; + ctx->use_path_cache = false; if (level >= 0) - store_backref_shared_cache(cache, root, bytenr, + store_backref_shared_cache(ctx, root, bytenr, level, false); node = ulist_next(tmp, &uiter); if (!node) break; bytenr = node->val; level++; - cached = lookup_backref_shared_cache(cache, root, bytenr, level, + cached = lookup_backref_shared_cache(ctx, root, bytenr, level, &is_shared); if (cached) { ret = (is_shared ? 1 : 0); diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index c846fa2bdf93..e3a2b45a76e3 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -23,13 +23,13 @@ struct btrfs_backref_shared_cache_entry { bool is_shared; }; -struct btrfs_backref_shared_cache { +struct btrfs_backref_share_check_ctx { /* * A path from a root to a leaf that has a file extent item pointing to * a given data extent should never exceed the maximum b+tree height. */ - struct btrfs_backref_shared_cache_entry entries[BTRFS_MAX_LEVEL]; - bool use_cache; + struct btrfs_backref_shared_cache_entry path_cache_entries[BTRFS_MAX_LEVEL]; + bool use_path_cache; }; typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root, @@ -80,7 +80,7 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid, int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, struct ulist *roots, struct ulist *tmp, - struct btrfs_backref_shared_cache *cache); + struct btrfs_backref_share_check_ctx *ctx); int __init btrfs_prelim_ref_init(void); void __cold btrfs_prelim_ref_exit(void); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 01feeb215bd9..b502a0d3da90 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3708,7 +3708,7 @@ static int fiemap_search_slot(struct btrfs_inode *inode, struct btrfs_path *path static int fiemap_process_hole(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, struct fiemap_cache *cache, - struct btrfs_backref_shared_cache *backref_cache, + struct btrfs_backref_share_check_ctx *backref_ctx, u64 disk_bytenr, u64 extent_offset, u64 extent_gen, struct ulist *roots, struct ulist *tmp_ulist, @@ -3758,7 +3758,7 @@ static int fiemap_process_hole(struct btrfs_inode *inode, disk_bytenr, extent_gen, roots, tmp_ulist, - backref_cache); + backref_ctx); if (ret < 0) return ret; else if (ret > 0) @@ -3808,7 +3808,7 @@ static int fiemap_process_hole(struct btrfs_inode *inode, disk_bytenr, extent_gen, roots, tmp_ulist, - backref_cache); + backref_ctx); if (ret < 0) return ret; else if (ret > 0) @@ -3907,7 +3907,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, struct extent_state *cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; - struct btrfs_backref_shared_cache *backref_cache; + struct btrfs_backref_share_check_ctx *backref_ctx; struct ulist *roots; struct ulist *tmp_ulist; u64 last_extent_end; @@ -3917,11 +3917,11 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, bool stopped = false; int ret; - backref_cache = kzalloc(sizeof(*backref_cache), GFP_KERNEL); + backref_ctx = kzalloc(sizeof(*backref_ctx), GFP_KERNEL); path = btrfs_alloc_path(); roots = ulist_alloc(GFP_KERNEL); tmp_ulist = ulist_alloc(GFP_KERNEL); - if (!backref_cache || !path || !roots || !tmp_ulist) { + if (!backref_ctx || !path || !roots || !tmp_ulist) { ret = -ENOMEM; goto out; } @@ -3981,7 +3981,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, const u64 range_end = min(key.offset, lockend) - 1; ret = fiemap_process_hole(inode, fieinfo, &cache, - backref_cache, 0, 0, 0, + backref_ctx, 0, 0, 0, roots, tmp_ulist, prev_extent_end, range_end); if (ret < 0) { @@ -4022,14 +4022,14 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_len, flags); } else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) { ret = fiemap_process_hole(inode, fieinfo, &cache, - backref_cache, + backref_ctx, disk_bytenr, extent_offset, extent_gen, roots, tmp_ulist, key.offset, extent_end - 1); } else if (disk_bytenr == 0) { /* We have an explicit hole. */ ret = fiemap_process_hole(inode, fieinfo, &cache, - backref_cache, 0, 0, 0, + backref_ctx, 0, 0, 0, roots, tmp_ulist, key.offset, extent_end - 1); } else { @@ -4040,7 +4040,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_gen, roots, tmp_ulist, - backref_cache); + backref_ctx); if (ret < 0) goto out_unlock; else if (ret > 0) @@ -4089,7 +4089,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, path = NULL; if (!stopped && prev_extent_end < lockend) { - ret = fiemap_process_hole(inode, fieinfo, &cache, backref_cache, + ret = fiemap_process_hole(inode, fieinfo, &cache, backref_ctx, 0, 0, 0, roots, tmp_ulist, prev_extent_end, lockend - 1); if (ret < 0) @@ -4122,7 +4122,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, out_unlock: unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); out: - kfree(backref_cache); + kfree(backref_ctx); btrfs_free_path(path); ulist_free(roots); ulist_free(tmp_ulist); From patchwork Mon Oct 10 10:22:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0679C433F5 for ; Mon, 10 Oct 2022 10:22:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231239AbiJJKWz (ORCPT ); Mon, 10 Oct 2022 06:22:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231717AbiJJKWl (ORCPT ); Mon, 10 Oct 2022 06:22:41 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AA5B6AE8E for ; Mon, 10 Oct 2022 03:22:35 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DABF360ECD for ; Mon, 10 Oct 2022 10:22:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D034CC433D6 for ; Mon, 10 Oct 2022 10:22:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397354; bh=UhI3udP8I905R6xN/amdXYC22sBLPRoSEKZzj28VP5A=; h=From:To:Subject:Date:In-Reply-To:References:From; b=AV8mFDPjY4y9ziu3QRez0N3gZTZFo4ceaSnG0mX9mynGnBKzUPG5WLtSQazed7x6s dDSr7hPiuFrUiQb5C1rcRlR830gJtiXVoTcExH7r0s2jI0wINc2qe8ZQQ1dtTKtDc8 ufB7RAdlljOAn5CoyKPDfIwGyPOKMKiExTSxD45pbqocBxmIckqGiYLXy9563GpYZ9 1cCuCSivzy2yeKczz22/FW9aLAMyy4VQk3ak6u5hnyMwIYsoDGJcY7ESSpNWPDbpb3 WBU8R9KM6G7WylpK1i/dW5NhRjIj+CwAgHUP2rMwQr/d/BeEe3aktQGsfg8BkpBG/0 ZEib5l+QmW3fQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/18] btrfs: move ulists to data extent sharedness check context Date: Mon, 10 Oct 2022 11:22:14 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When calling btrfs_is_data_extent_shared() we pass two ulists that were allocated by the caller. This is because the single caller, fiemap, calls btrfs_is_data_extent_shared() multiple times and the ulists can be reused, instead of allocating new ones before each call and freeing them after each call. Now that we have a context structure/object that we pass to btrfs_is_data_extent_shared(), we can move those ulists to it, and hide their allocation and the context's allocation in a helper function, as well as the freeing of the ulists and the context object. This allows to reduce the number of parameters passed to btrfs_is_data_extent_shared(), the need to pass the ulists from extent_fiemap() to fiemap_process_hole() and having the caller deal with allocating and releasing the ulists. Also rename one of the ulists from 'tmp' / 'tmp_ulist' to 'refs', since that's a much better name as it reflects what the list is used for (and matching the argument name for find_parent_nodes()). Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 43 ++++++++++++++++++++++++++++++++----------- fs/btrfs/backref.h | 7 ++++++- fs/btrfs/extent_io.c | 34 ++++++++++------------------------ 3 files changed, 48 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 36a8bc2971b1..e71e2f208cc5 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1628,6 +1628,30 @@ static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx } } +struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void) +{ + struct btrfs_backref_share_check_ctx *ctx; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + ulist_init(&ctx->refs); + ulist_init(&ctx->roots); + + return ctx; +} + +void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx) +{ + if (!ctx) + return; + + ulist_release(&ctx->refs); + ulist_release(&ctx->roots); + kfree(ctx); +} + /* * Check if a data extent is shared or not. * @@ -1635,8 +1659,6 @@ static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx * @bytenr: Logical bytenr of the extent we are checking. * @extent_gen: Generation of the extent (file extent item) or 0 if it is * not known. - * @roots: List of roots this extent is shared among. - * @tmp: Temporary list used for iteration. * @ctx: A backref sharedness check context. * * btrfs_is_data_extent_shared uses the backref walking code but will short @@ -1652,7 +1674,6 @@ static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx */ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, - struct ulist *roots, struct ulist *tmp, struct btrfs_backref_share_check_ctx *ctx) { struct btrfs_root *root = inode->root; @@ -1669,8 +1690,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, }; int level; - ulist_init(roots); - ulist_init(tmp); + ulist_init(&ctx->roots); + ulist_init(&ctx->refs); trans = btrfs_join_transaction_nostart(root); if (IS_ERR(trans)) { @@ -1692,8 +1713,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, bool is_shared; bool cached; - ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, tmp, - roots, NULL, &shared, false); + ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, &ctx->refs, + &ctx->roots, NULL, &shared, false); if (ret == BACKREF_FOUND_SHARED) { /* this is the only condition under which we return 1 */ ret = 1; @@ -1732,13 +1753,13 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, * deal with), we can not use it if we have multiple leaves * (which implies multiple paths). */ - if (level == -1 && tmp->nnodes > 1) + if (level == -1 && ctx->refs.nnodes > 1) ctx->use_path_cache = false; if (level >= 0) store_backref_shared_cache(ctx, root, bytenr, level, false); - node = ulist_next(tmp, &uiter); + node = ulist_next(&ctx->refs, &uiter); if (!node) break; bytenr = node->val; @@ -1760,8 +1781,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, up_read(&fs_info->commit_root_sem); } out: - ulist_release(roots); - ulist_release(tmp); + ulist_release(&ctx->roots); + ulist_release(&ctx->refs); return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index e3a2b45a76e3..8da0ba6b94a4 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -24,6 +24,9 @@ struct btrfs_backref_shared_cache_entry { }; struct btrfs_backref_share_check_ctx { + /* Ulists used during backref walking. */ + struct ulist refs; + struct ulist roots; /* * A path from a root to a leaf that has a file extent item pointing to * a given data extent should never exceed the maximum b+tree height. @@ -35,6 +38,9 @@ struct btrfs_backref_share_check_ctx { typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root, void *ctx); +struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void); +void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx); + int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical, struct btrfs_path *path, struct btrfs_key *found_key, u64 *flags); @@ -79,7 +85,6 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid, u64 *found_off); int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, u64 extent_gen, - struct ulist *roots, struct ulist *tmp, struct btrfs_backref_share_check_ctx *ctx); int __init btrfs_prelim_ref_init(void); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b502a0d3da90..de62bdf11b33 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3711,7 +3711,6 @@ static int fiemap_process_hole(struct btrfs_inode *inode, struct btrfs_backref_share_check_ctx *backref_ctx, u64 disk_bytenr, u64 extent_offset, u64 extent_gen, - struct ulist *roots, struct ulist *tmp_ulist, u64 start, u64 end) { const u64 i_size = i_size_read(&inode->vfs_inode); @@ -3755,10 +3754,9 @@ static int fiemap_process_hole(struct btrfs_inode *inode, if (prealloc_len > 0) { if (!checked_extent_shared && fieinfo->fi_extents_max) { ret = btrfs_is_data_extent_shared(inode, - disk_bytenr, - extent_gen, roots, - tmp_ulist, - backref_ctx); + disk_bytenr, + extent_gen, + backref_ctx); if (ret < 0) return ret; else if (ret > 0) @@ -3806,8 +3804,7 @@ static int fiemap_process_hole(struct btrfs_inode *inode, if (!checked_extent_shared && fieinfo->fi_extents_max) { ret = btrfs_is_data_extent_shared(inode, disk_bytenr, - extent_gen, roots, - tmp_ulist, + extent_gen, backref_ctx); if (ret < 0) return ret; @@ -3908,8 +3905,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, struct btrfs_path *path; struct fiemap_cache cache = { 0 }; struct btrfs_backref_share_check_ctx *backref_ctx; - struct ulist *roots; - struct ulist *tmp_ulist; u64 last_extent_end; u64 prev_extent_end; u64 lockstart; @@ -3917,11 +3912,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, bool stopped = false; int ret; - backref_ctx = kzalloc(sizeof(*backref_ctx), GFP_KERNEL); + backref_ctx = btrfs_alloc_backref_share_check_ctx(); path = btrfs_alloc_path(); - roots = ulist_alloc(GFP_KERNEL); - tmp_ulist = ulist_alloc(GFP_KERNEL); - if (!backref_ctx || !path || !roots || !tmp_ulist) { + if (!backref_ctx || !path) { ret = -ENOMEM; goto out; } @@ -3982,7 +3975,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = fiemap_process_hole(inode, fieinfo, &cache, backref_ctx, 0, 0, 0, - roots, tmp_ulist, prev_extent_end, range_end); if (ret < 0) { goto out_unlock; @@ -4024,13 +4016,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = fiemap_process_hole(inode, fieinfo, &cache, backref_ctx, disk_bytenr, extent_offset, - extent_gen, roots, tmp_ulist, - key.offset, extent_end - 1); + extent_gen, key.offset, + extent_end - 1); } else if (disk_bytenr == 0) { /* We have an explicit hole. */ ret = fiemap_process_hole(inode, fieinfo, &cache, backref_ctx, 0, 0, 0, - roots, tmp_ulist, key.offset, extent_end - 1); } else { /* We have a regular extent. */ @@ -4038,8 +4029,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = btrfs_is_data_extent_shared(inode, disk_bytenr, extent_gen, - roots, - tmp_ulist, backref_ctx); if (ret < 0) goto out_unlock; @@ -4090,8 +4079,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, if (!stopped && prev_extent_end < lockend) { ret = fiemap_process_hole(inode, fieinfo, &cache, backref_ctx, - 0, 0, 0, roots, tmp_ulist, - prev_extent_end, lockend - 1); + 0, 0, 0, prev_extent_end, lockend - 1); if (ret < 0) goto out_unlock; prev_extent_end = lockend; @@ -4122,10 +4110,8 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, out_unlock: unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); out: - kfree(backref_ctx); + btrfs_free_backref_share_ctx(backref_ctx); btrfs_free_path(path); - ulist_free(roots); - ulist_free(tmp_ulist); return ret; } From patchwork Mon Oct 10 10:22:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94D51C433FE for ; Mon, 10 Oct 2022 10:23:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231716AbiJJKW7 (ORCPT ); Mon, 10 Oct 2022 06:22:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231865AbiJJKWp (ORCPT ); Mon, 10 Oct 2022 06:22:45 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5D7C6B15D for ; Mon, 10 Oct 2022 03:22:37 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8668DB80E7B for ; Mon, 10 Oct 2022 10:22:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B5325C433C1 for ; Mon, 10 Oct 2022 10:22:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397355; bh=7liV9PxLlVyzzPNBdy1UwWJdAfEaPqVXTawQCVGYsbw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=HdHfHwlJ7UeNdzd+wkmpMKdmay5GgmEJm55R5S4zs9UiynIYWtIl0uZUhIZIi05o1 oTYkPDE9FlekoDv6JjARsSgCq/3EI13/iMA4pvpcT4Gm/ONqd2HmnhKXmN3H5tMBLp aH3DwLJPBPz4fabzbfkQURcqNNznT6nDCUO+WdoTvIKKcwO6AMyBnXGJCXQd2VuFuJ qPmJBf6SYqREREygawFErqZAcMSIRws7nmykvZ3hFuvqOOkicl3sFDZdreRwoE5qnK fTpo+EOOFEiXDl/rGvmsjKQOeS2Gbi4gUTo9PqrQqYNquF/bANtQ2giP7QwOFKl7xC kwFcwLoEM9G7A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 13/18] btrfs: remove roots ulist when checking data extent sharedness Date: Mon, 10 Oct 2022 11:22:15 +0100 Message-Id: <3c22f241927dd971854881fb5827197ee544f88c.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Currently btrfs_is_data_extent_shared() is passing a ulist for the roots argument of find_parent_nodes(), however it does not use that ulist for anything and for this context that list always ends up with at most one element. Since find_parent_nodes() is able to deal with a NULL ulist for its roots argument, make btrfs_is_data_extent_shared() pass it NULL and avoid the burden of allocating memory for the unnused roots ulist, initializing it, releasing it and allocating one struct ulist_node for it during the call to find_parent_nodes(). Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 6 +----- fs/btrfs/backref.h | 1 - 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index e71e2f208cc5..c2ec132c2a9c 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1637,7 +1637,6 @@ struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void) return NULL; ulist_init(&ctx->refs); - ulist_init(&ctx->roots); return ctx; } @@ -1648,7 +1647,6 @@ void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx) return; ulist_release(&ctx->refs); - ulist_release(&ctx->roots); kfree(ctx); } @@ -1690,7 +1688,6 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, }; int level; - ulist_init(&ctx->roots); ulist_init(&ctx->refs); trans = btrfs_join_transaction_nostart(root); @@ -1714,7 +1711,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, bool cached; ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, &ctx->refs, - &ctx->roots, NULL, &shared, false); + NULL, NULL, &shared, false); if (ret == BACKREF_FOUND_SHARED) { /* this is the only condition under which we return 1 */ ret = 1; @@ -1781,7 +1778,6 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, up_read(&fs_info->commit_root_sem); } out: - ulist_release(&ctx->roots); ulist_release(&ctx->refs); return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 8da0ba6b94a4..5f468f0defda 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -26,7 +26,6 @@ struct btrfs_backref_shared_cache_entry { struct btrfs_backref_share_check_ctx { /* Ulists used during backref walking. */ struct ulist refs; - struct ulist roots; /* * A path from a root to a leaf that has a file extent item pointing to * a given data extent should never exceed the maximum b+tree height. From patchwork Mon Oct 10 10:22:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55158C433F5 for ; Mon, 10 Oct 2022 10:22:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231703AbiJJKW5 (ORCPT ); Mon, 10 Oct 2022 06:22:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231646AbiJJKWm (ORCPT ); Mon, 10 Oct 2022 06:22:42 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 445B35A816 for ; Mon, 10 Oct 2022 03:22:37 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A66FA60ED3 for ; Mon, 10 Oct 2022 10:22:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99C3FC433D6 for ; Mon, 10 Oct 2022 10:22:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397356; bh=x62pqM/XRdoE8d/3+vyjhdSKtlk/VMUcD20O+luvXm0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=hG9BJjrLARSX2ZtiyEmwIflLxKr5McPcjVI5464dGz1M5ZYVZeFHrtMJJSj7FpN+y zajPoHles4lIZarqgcVcQRxMzuniG9Sbv86AOBQPhgs4QzcOSJx55pJ0jA5LpBgc+q 9B/JaWUD8lXOfpVIIP72OG91SzAx5968Y7hujeEokbg5per2b21R0JKodvwYVfUuLT MeOdD0tGCjuFSoENw2fgzMajtZraWWC3ZQn3pJTJHPvYrJmybYX8xJsiNoecsloUPJ glbwSzk+g5JNr7SzRj4aFOttMozKklmS23JLuAyYPr1dtfzjEPB7OEQvEAlCk8PDCo JvilbTaps3tFA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 14/18] btrfs: remove useless logic when finding parent nodes Date: Mon, 10 Oct 2022 11:22:16 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana At find_parent_nodes(), at its last step, when iterating over all direct references, we are checking if we have a share context and if we have a reference with a different root from the one in the share context. However that logic is pointless because of two reasons: 1) After the previous patch in the series (subject "btrfs: remove roots ulist when checking data extent sharedness"), the roots argument is always NULL when using a share check context (struct share_check), so this code is never triggered; 2) Even before that previous patch, we could not hit this code because if we had a reference with a root different from the one in our share context, then we would have exited earlier when doing either of the following: - Adding a second direct ref to the direct refs red black tree resulted in extent_is_shared() returning true when called from add_direct_ref() -> add_prelim_ref(), after processing delayed references or while processing references in the extent tree; - When adding a second reference to the indirect refs red black tree (same as above, extent_is_shared() returns true); - If we only have one indirect reference and no direct references, then when resolving it at resolve_indirect_refs() we immediately return that the target extent is shared, therefore never reaching that loop that iterates over all direct references at find_parent_nodes(); - If we have 1 indirect reference and 1 direct reference, then we also exit early because extent_is_shared() ends up returning true when called through add_prelim_ref() (by add_direct_ref() or add_indirect_ref()) or add_delayed_refs(). Same applies as when having a combination of direct, indirect and indirect with missing key references. This logic had been obsoleted since commit 3ec4d3238ab165 ("btrfs: allow backref search checks for shared extents"), which introduced the early exits in case an extent is shared. So just remove that logic, and assert at find_parent_nodes() that when we have a share context we don't have a roots ulist and that we haven't found the extent to be directly shared after processing delayed references and all references from the extent tree. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index c2ec132c2a9c..9332401affab 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1188,6 +1188,10 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, .indirect_missing_keys = PREFTREE_INIT }; + /* Roots ulist is not needed when using a sharedness check context. */ + if (sc) + ASSERT(roots == NULL); + key.objectid = bytenr; key.offset = (u64)-1; if (btrfs_fs_incompat(fs_info, SKINNY_METADATA)) @@ -1279,6 +1283,20 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, } } + /* + * If we have a share context and we reached here, it means the extent + * is not directly shared (no multiple reference items for it), + * otherwise we would have exited earlier with a return value of + * BACKREF_FOUND_SHARED after processing delayed references or while + * processing inline or keyed references from the extent tree. + * The extent may however be indirectly shared through shared subtrees + * as a result from creating snapshots, so we determine below what is + * its parent node, in case we are dealing with a metadata extent, or + * what's the leaf (or leaves), from a fs tree, that has a file extent + * item pointing to it in case we are dealing with a data extent. + */ + ASSERT(extent_is_shared(sc) == 0); + btrfs_release_path(path); ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); @@ -1316,11 +1334,6 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * and would retain their original ref->count < 0. */ if (roots && ref->count && ref->root_id && ref->parent == 0) { - if (sc && ref->root_id != sc->root_objectid) { - ret = BACKREF_FOUND_SHARED; - goto out; - } - /* no parent == root of tree */ ret = ulist_add(roots, ref->root_id, 0, GFP_NOFS); if (ret < 0) From patchwork Mon Oct 10 10:22:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A68DC4332F for ; Mon, 10 Oct 2022 10:23:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231720AbiJJKXB (ORCPT ); Mon, 10 Oct 2022 06:23:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231892AbiJJKWq (ORCPT ); Mon, 10 Oct 2022 06:22:46 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3047C6AA3E for ; Mon, 10 Oct 2022 03:22:38 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8C73760EB1 for ; Mon, 10 Oct 2022 10:22:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7ED4EC433B5 for ; Mon, 10 Oct 2022 10:22:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397357; bh=CeHfKSAPw8H9xHzHQLWFFMTdB5oPhrdTw7BswKibY5M=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ZjLFhEzeR7FwN5pQ5sa18bhuaaiXSeVSjVL6NUwtM+ByyUpD5wwqpIcf8eBu+xEpp J63MNr4Xu4oSmmzIAuWcQQl1VQyxKYKPFtpiQVoQBxwryn0SbFOVmrJSW3m7lFV0e+ oD9ZSoDEZoE1cMwYMd0CwnYTPxSjNdJ0br+UjIR3A0tSLsuv6ON1VxVvo+Jn3F7q73 L1eUNQlgnS2t/gQ34LP6/sf/UoTn9zHKgc57H2yFNRq3gA6iiKpq8CfCJu2X0LzQAh bdAIlEQuksGTVUzua4+3zHcczXDINRjvbUa4PSDCthiL376XBzxgmjf93xSgA6GfOi oG7D1/n4bBJ/A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 15/18] btrfs: cache sharedness of the last few data extents during fiemap Date: Mon, 10 Oct 2022 11:22:17 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During fiemap we process all the file extent items of an inode, by their file offset order (left to right b+tree order), and then check if the data extent they point at is shared or not. Until now we didn't cache those results, we only did it for b+tree nodes/leaves since for each unique b+tree path we have access to hundreds of file extent items. However, it is also common to repeat checking the sharedness of a particular data extent in a very short time window, and the cases that lead to that are the following: 1) COW writes. If have a file extent item like this: [ bytenr X, offset = 0, num_bytes = 512K ] file offset 0 512K Then a 4K write into file offset 64K happens, we end up with the following file extent item layout: [ bytenr X, offset = 0, num_bytes = 64K ] file offset 0 64K [ bytenr Y, offset = 0, num_bytes = 4K ] file offset 64K 68K [ bytenr X, offset = 68K, num_bytes = 444K ] file offset 68K 512K So during fiemap we well check for the sharedness of the data extent with bytenr X twice. Typically for COW writes and for at least moderately updated files, we end up with many file extent items that point to different sections of the same data extent. 2) Writing into a NOCOW file after a snapshot is taken. This happens if the target extent was created in a generation older than the generation where the last snapshot for the root (the tree the inode belongs to) was made. This leads to a scenario like the previous one. 3) Writing into sections of a preallocated extent. For example if a file has the following layout: [ bytenr X, offset = 0, num_bytes = 1M, type = prealloc ] 0 1M After doing a 4K write into file offset 0 and another 4K write into offset 512K, we get the following layout: [ bytenr X, offset = 0, num_bytes = 4K, type = regular ] 0 4K [ bytenr X, offset = 4K, num_bytes = 508K, type = prealloc ] 4K 512K [ bytenr X, offset = 512K, num_bytes = 4K, type = regular ] 512K 516K [ bytenr X, offset = 516K, num_bytes = 508K, type = prealloc ] 516K 1M So we end up with 4 consecutive file extent items pointing to the data extent at bytenr X. 4) Hole punching in the middle of an extent. For example if a file has the following file extent item: [ bytenr X, offset = 0, num_bytes = 8M ] 0 8M And then hole is punched for the file range [4M, 6M[, we our file extent item split into two: [ bytenr X, offset = 0, num_bytes = 4M ] 0 4M [ 2M hole, implicit or explicit depending on NO_HOLES feature ] 4M 6M [ bytenr X, offset = 6M, num_bytes = 2M ] 6M 8M Again, we end up with two file extent items pointing to the same data extent. 5) When reflinking (clone and deduplication) within the same file. This is probably the least common case of all. In cases 1, 2, 4 and 4, when we have multiple file extent items that point to the same data extent, their distance is usually short, typically separated by a few slots in a b+tree leaf (or across sibling leaves). For case 5, the distance can vary a lot, but it's typically the less common case. This change caches the result of the sharedness checks for data extents, but only for the last 8 extents that we notice that our inode refers to with multiple file extent items. Whenever we want to check if a data extent is shared, we lookup the cache which consists of doing a linear scan of an 8 elements array, and if we find the data extent there, we return the result and don't check the extent tree and delayed refs. The array/cache is small so that doing the search has no noticeable negative impact on the performance in case we don't have file extent items within a distance of 8 slots that point to the same data extent. Slots in the cache/array are overwritten in a simple round robin fashion, as that approach fits very well. Using this simple approach with only the last 8 data extents seen is effective as usually when multiple file extents items point to the same data extent, their distance is within 8 slots. It also uses very little memory and the time to cache a result or lookup the cache is negligible. The following test was run on non-debug kernel (Debian's default kernel config) to measure the impact in the case of COW writes (first example given above), where we run fiemap after overwriting 33% of the blocks of a file: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount $DEV $MNT FILE_SIZE=$((1 * 1024 * 1024 * 1024)) # Create the file full of 1M extents. xfs_io -f -s -c "pwrite -b 1M -S 0xab 0 $FILE_SIZE" $MNT/foobar block_count=$((FILE_SIZE / 4096)) # Overwrite about 33% of the file blocks. overwrite_count=$((block_count / 3)) echo -e "\nOverwriting $overwrite_count 4K blocks (out of $block_count)..." RANDOM=123 for ((i = 1; i <= $overwrite_count; i++)); do off=$(((RANDOM % block_count) * 4096)) xfs_io -c "pwrite -S 0xcd $off 4K" $MNT/foobar > /dev/null echo -ne "\r$i blocks overwritten..." done echo -e "\n" # Unmount and mount to clear all cached metadata. umount $MNT mount $DEV $MNT start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds" umount $MNT Result before applying this patch: fiemap took 128 milliseconds Result after applying this patch: fiemap took 92 milliseconds (-28.1%) The test is somewhat limited in the sense the gains may be higher in practice, because in the test the filesystem is small, so we have small fs and extent trees, plus there's no concurrent access to the trees as well, therefore no lock contention there. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 50 +++++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/backref.h | 27 +++++++++++++++++++++++++ 2 files changed, 74 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 9332401affab..8d4e7ddbefb4 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -137,7 +137,25 @@ struct preftrees { struct share_check { u64 root_objectid; u64 inum; + u64 data_bytenr; + /* + * Counts number of inodes that refer to an extent (different inodes in + * the same root or different roots) that we could find. The sharedness + * check typically stops once this counter gets greater than 1, so it + * may not reflect the total number of inodes. + */ int share_count; + /* + * The number of times we found our inode refers to the data extent we + * are determining the sharedness. In other words, how many file extent + * items we could find for our inode that point to our target data + * extent. The value we get here after finishing the extent sharedness + * check may be smaller than reality, but if it ends up being greater + * than 1, then we know for sure the inode has multiple file extent + * items that point to our inode, and we can safely assume it's useful + * to cache the sharedness check result. + */ + int self_ref_count; }; static inline int extent_is_shared(struct share_check *sc) @@ -206,7 +224,7 @@ static int prelim_ref_compare(struct prelim_ref *ref1, } static void update_share_count(struct share_check *sc, int oldcount, - int newcount) + int newcount, struct prelim_ref *newref) { if ((!sc) || (oldcount == 0 && newcount < 1)) return; @@ -215,6 +233,11 @@ static void update_share_count(struct share_check *sc, int oldcount, sc->share_count--; else if (oldcount < 1 && newcount > 0) sc->share_count++; + + if (newref->root_id == sc->root_objectid && + newref->wanted_disk_byte == sc->data_bytenr && + newref->key_for_search.objectid == sc->inum) + sc->self_ref_count += newref->count; } /* @@ -265,14 +288,14 @@ static void prelim_ref_insert(const struct btrfs_fs_info *fs_info, * BTRFS_[ADD|DROP]_DELAYED_REF actions. */ update_share_count(sc, ref->count, - ref->count + newref->count); + ref->count + newref->count, newref); ref->count += newref->count; free_pref(newref); return; } } - update_share_count(sc, 0, newref->count); + update_share_count(sc, 0, newref->count, newref); preftree->count++; trace_btrfs_prelim_ref_insert(fs_info, newref, NULL, preftree->count); rb_link_node(&newref->rbnode, parent, p); @@ -1697,10 +1720,17 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, struct share_check shared = { .root_objectid = root->root_key.objectid, .inum = btrfs_ino(inode), + .data_bytenr = bytenr, .share_count = 0, + .self_ref_count = 0, }; int level; + for (int i = 0; i < BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE; i++) { + if (ctx->prev_extents_cache[i].bytenr == bytenr) + return ctx->prev_extents_cache[i].is_shared; + } + ulist_init(&ctx->refs); trans = btrfs_join_transaction_nostart(root); @@ -1784,6 +1814,20 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, cond_resched(); } + /* + * Cache the sharedness result for the data extent if we know our inode + * has more than 1 file extent item that refers to the data extent. + */ + if (ret >= 0 && shared.self_ref_count > 1) { + int slot = ctx->prev_extents_cache_slot; + + ctx->prev_extents_cache[slot].bytenr = shared.data_bytenr; + ctx->prev_extents_cache[slot].is_shared = (ret == 1); + + slot = (slot + 1) % BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE; + ctx->prev_extents_cache_slot = slot; + } + if (trans) { btrfs_put_tree_mod_seq(fs_info, &elem); btrfs_end_transaction(trans); diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 5f468f0defda..fda78db50be6 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -23,6 +23,8 @@ struct btrfs_backref_shared_cache_entry { bool is_shared; }; +#define BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE 8 + struct btrfs_backref_share_check_ctx { /* Ulists used during backref walking. */ struct ulist refs; @@ -32,6 +34,31 @@ struct btrfs_backref_share_check_ctx { */ struct btrfs_backref_shared_cache_entry path_cache_entries[BTRFS_MAX_LEVEL]; bool use_path_cache; + /* + * Cache the sharedness result for the last few extents we have found, + * but only for extents for which we have multiple file extent items + * that point to them. + * It's very common to have several file extent items that point to the + * same extent (bytenr) but with different offsets and lengths. This + * typically happens for COW writes, partial writes into prealloc + * extents, NOCOW writes after snapshoting a root, hole punching or + * reflinking within the same file (less common perhaps). + * So keep a small cache with the lookup results for the extent pointed + * by the last few file extent items. This cache is checked, with a + * linear scan, whenever btrfs_is_data_extent_shared() is called, so + * it must be small so that it does not negatively affect performance in + * case we don't have multiple file extent items that point to the same + * data extent. + */ + struct { + u64 bytenr; + bool is_shared; + } prev_extents_cache[BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE]; + /* + * The slot in the prev_extents_cache array that will be used for + * storing the sharedness result of a new data extent. + */ + int prev_extents_cache_slot; }; typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root, From patchwork Mon Oct 10 10:22:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1813C433F5 for ; Mon, 10 Oct 2022 10:23:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231244AbiJJKXA (ORCPT ); Mon, 10 Oct 2022 06:23:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231864AbiJJKWp (ORCPT ); Mon, 10 Oct 2022 06:22:45 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C0566B163 for ; Mon, 10 Oct 2022 03:22:38 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6DED560EC9 for ; Mon, 10 Oct 2022 10:22:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 642D6C433C1 for ; Mon, 10 Oct 2022 10:22:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397357; bh=h6zog3RQEke90JQcSjx/mwsCneNlC2/aILgaCxXAeNI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=SV2PYRgCSxlMRrq49xMVxI597DYNlP0PEWs2U2WGfXf6pvhOicP3/G/4naaVHiwEU iEyCMdpcUqT83Mfu/MISZYKOeK/vfPQkf4S56T4OPY6jtkPE0cdQ5/pkS+l+up0usD EYjJvT6HUITZnIesatVGFt0VrWkjrpE5vJ/K8Cz4lNtZPvDT8Ebw0ei49u7sBC9ryd PgyhCIb5oCHCLytJnIQ7Wu8cGiRaO2uOMWtwD0lLHBtrh0ag7tnufKIC/iKX87HX8g l0R4n7uiZ8dgyIt8xPrNnDwqxaM9pSDvihp0xQfzUROccth+/M+vu+qm4zJ8AJxBze WlL5Ulh7THuLA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 16/18] btrfs: move up backref sharedness cache store and lookup functions Date: Mon, 10 Oct 2022 11:22:18 +0100 Message-Id: <4feb670d89ca1506088d461df2db30fbb40d689e.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana Move the static functions to lookup and store sharedness check of an extent buffer to a location above find_all_parents(), because in the next patch the lookup function will be used by find_all_parents(). The store function is also moved just because it's the counter part to the lookup function and it's best to have their definitions close together. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 236 ++++++++++++++++++++++----------------------- 1 file changed, 118 insertions(+), 118 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 8d4e7ddbefb4..977f07903156 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1167,6 +1167,124 @@ static int add_keyed_refs(struct btrfs_root *extent_root, return ret; } +/* + * The caller has joined a transaction or is holding a read lock on the + * fs_info->commit_root_sem semaphore, so no need to worry about the root's last + * snapshot field changing while updating or checking the cache. + */ +static bool lookup_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, + struct btrfs_root *root, + u64 bytenr, int level, bool *is_shared) +{ + struct btrfs_backref_shared_cache_entry *entry; + + if (!ctx->use_path_cache) + return false; + + if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) + return false; + + /* + * Level -1 is used for the data extent, which is not reliable to cache + * because its reference count can increase or decrease without us + * realizing. We cache results only for extent buffers that lead from + * the root node down to the leaf with the file extent item. + */ + ASSERT(level >= 0); + + entry = &ctx->path_cache_entries[level]; + + /* Unused cache entry or being used for some other extent buffer. */ + if (entry->bytenr != bytenr) + return false; + + /* + * We cached a false result, but the last snapshot generation of the + * root changed, so we now have a snapshot. Don't trust the result. + */ + if (!entry->is_shared && + entry->gen != btrfs_root_last_snapshot(&root->root_item)) + return false; + + /* + * If we cached a true result and the last generation used for dropping + * a root changed, we can not trust the result, because the dropped root + * could be a snapshot sharing this extent buffer. + */ + if (entry->is_shared && + entry->gen != btrfs_get_last_root_drop_gen(root->fs_info)) + return false; + + *is_shared = entry->is_shared; + /* + * If the node at this level is shared, than all nodes below are also + * shared. Currently some of the nodes below may be marked as not shared + * because we have just switched from one leaf to another, and switched + * also other nodes above the leaf and below the current level, so mark + * them as shared. + */ + if (*is_shared) { + for (int i = 0; i < level; i++) { + ctx->path_cache_entries[i].is_shared = true; + ctx->path_cache_entries[i].gen = entry->gen; + } + } + + return true; +} + +/* + * The caller has joined a transaction or is holding a read lock on the + * fs_info->commit_root_sem semaphore, so no need to worry about the root's last + * snapshot field changing while updating or checking the cache. + */ +static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, + struct btrfs_root *root, + u64 bytenr, int level, bool is_shared) +{ + struct btrfs_backref_shared_cache_entry *entry; + u64 gen; + + if (!ctx->use_path_cache) + return; + + if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) + return; + + /* + * Level -1 is used for the data extent, which is not reliable to cache + * because its reference count can increase or decrease without us + * realizing. We cache results only for extent buffers that lead from + * the root node down to the leaf with the file extent item. + */ + ASSERT(level >= 0); + + if (is_shared) + gen = btrfs_get_last_root_drop_gen(root->fs_info); + else + gen = btrfs_root_last_snapshot(&root->root_item); + + entry = &ctx->path_cache_entries[level]; + entry->bytenr = bytenr; + entry->is_shared = is_shared; + entry->gen = gen; + + /* + * If we found an extent buffer is shared, set the cache result for all + * extent buffers below it to true. As nodes in the path are COWed, + * their sharedness is moved to their children, and if a leaf is COWed, + * then the sharedness of a data extent becomes direct, the refcount of + * data extent is increased in the extent item at the extent tree. + */ + if (is_shared) { + for (int i = 0; i < level; i++) { + entry = &ctx->path_cache_entries[i]; + entry->is_shared = is_shared; + entry->gen = gen; + } + } +} + /* * this adds all existing backrefs (inline backrefs, backrefs and delayed * refs) for the given bytenr to the refs list, merges duplicates and resolves @@ -1546,124 +1664,6 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans, return ret; } -/* - * The caller has joined a transaction or is holding a read lock on the - * fs_info->commit_root_sem semaphore, so no need to worry about the root's last - * snapshot field changing while updating or checking the cache. - */ -static bool lookup_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, - struct btrfs_root *root, - u64 bytenr, int level, bool *is_shared) -{ - struct btrfs_backref_shared_cache_entry *entry; - - if (!ctx->use_path_cache) - return false; - - if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) - return false; - - /* - * Level -1 is used for the data extent, which is not reliable to cache - * because its reference count can increase or decrease without us - * realizing. We cache results only for extent buffers that lead from - * the root node down to the leaf with the file extent item. - */ - ASSERT(level >= 0); - - entry = &ctx->path_cache_entries[level]; - - /* Unused cache entry or being used for some other extent buffer. */ - if (entry->bytenr != bytenr) - return false; - - /* - * We cached a false result, but the last snapshot generation of the - * root changed, so we now have a snapshot. Don't trust the result. - */ - if (!entry->is_shared && - entry->gen != btrfs_root_last_snapshot(&root->root_item)) - return false; - - /* - * If we cached a true result and the last generation used for dropping - * a root changed, we can not trust the result, because the dropped root - * could be a snapshot sharing this extent buffer. - */ - if (entry->is_shared && - entry->gen != btrfs_get_last_root_drop_gen(root->fs_info)) - return false; - - *is_shared = entry->is_shared; - /* - * If the node at this level is shared, than all nodes below are also - * shared. Currently some of the nodes below may be marked as not shared - * because we have just switched from one leaf to another, and switched - * also other nodes above the leaf and below the current level, so mark - * them as shared. - */ - if (*is_shared) { - for (int i = 0; i < level; i++) { - ctx->path_cache_entries[i].is_shared = true; - ctx->path_cache_entries[i].gen = entry->gen; - } - } - - return true; -} - -/* - * The caller has joined a transaction or is holding a read lock on the - * fs_info->commit_root_sem semaphore, so no need to worry about the root's last - * snapshot field changing while updating or checking the cache. - */ -static void store_backref_shared_cache(struct btrfs_backref_share_check_ctx *ctx, - struct btrfs_root *root, - u64 bytenr, int level, bool is_shared) -{ - struct btrfs_backref_shared_cache_entry *entry; - u64 gen; - - if (!ctx->use_path_cache) - return; - - if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)) - return; - - /* - * Level -1 is used for the data extent, which is not reliable to cache - * because its reference count can increase or decrease without us - * realizing. We cache results only for extent buffers that lead from - * the root node down to the leaf with the file extent item. - */ - ASSERT(level >= 0); - - if (is_shared) - gen = btrfs_get_last_root_drop_gen(root->fs_info); - else - gen = btrfs_root_last_snapshot(&root->root_item); - - entry = &ctx->path_cache_entries[level]; - entry->bytenr = bytenr; - entry->is_shared = is_shared; - entry->gen = gen; - - /* - * If we found an extent buffer is shared, set the cache result for all - * extent buffers below it to true. As nodes in the path are COWed, - * their sharedness is moved to their children, and if a leaf is COWed, - * then the sharedness of a data extent becomes direct, the refcount of - * data extent is increased in the extent item at the extent tree. - */ - if (is_shared) { - for (int i = 0; i < level; i++) { - entry = &ctx->path_cache_entries[i]; - entry->is_shared = is_shared; - entry->gen = gen; - } - } -} - struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void) { struct btrfs_backref_share_check_ctx *ctx; From patchwork Mon Oct 10 10:22:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 766C7C43217 for ; Mon, 10 Oct 2022 10:23:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231734AbiJJKXD (ORCPT ); Mon, 10 Oct 2022 06:23:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231637AbiJJKWq (ORCPT ); Mon, 10 Oct 2022 06:22:46 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FC2D6B161 for ; Mon, 10 Oct 2022 03:22:40 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5600160ED6 for ; Mon, 10 Oct 2022 10:22:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49AFEC433D6 for ; Mon, 10 Oct 2022 10:22:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397358; bh=Ice1Yw3zTo4s4h9B/XsbPKYpUa3av3Czv3NddB63058=; h=From:To:Subject:Date:In-Reply-To:References:From; b=g47KNSljRfIT21K34teTgn2PpoD48KDigYHt7UTniw0e5PqVO12jdJTCM/iasLRND uFLv32xoeOJXjHqIpfiaoRJWgcwGYjCEbaZdpaATQeHQQTl6A49nN+vPvxNLpqr0GQ AMyb9+kKRMcO0LiZvU4n+obUM9JVFsEb8W4fKntXDZjp59nuth4vixLuCdxcGNALvP 1C7WER1y0c1LkV258C25jEA8jcbekqNvZoiPRyv/LGODB8xLf0rxyEc3BPkOXGnHxS hnpMWrNuByt66+ihe6BHAIXjYIU0NbGnnqkfqt4cd1c8t3tSOfi7FA4nAa7TAVYcto dDvNXIET44JaA== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 17/18] btrfs: avoid duplicated resolution of indirect backrefs during fiemap Date: Mon, 10 Oct 2022 11:22:19 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During fiemap, when determining if a data extent is shared or not, if we don't find the extent is directly shared, then we need to determine if it's shared through subtrees. For that we need to resolve the indirect reference we found in order to figure out the path in the inode's fs tree, which is a path starting at the fs tree's root node and going down to the leaf that contains the file extent item that points to the data extent. We then proceed to determine if any extent buffer in that path is shared with other trees or not. Currently whenever we find the data extent that a file extent item points to is not directly shared, we always resolve the path in the fs tree, and then check if any extent buffer in the path is shared. This is a lot of work and when we have file extent items that belong to the same leaf, we have the same path, so we only need to calculate it once. This change does that, it keeps track of the current and previous leaf, and when we find that a data extent is not directly shared, we try to compute the fs tree path only once and then use it for every other file extent item in the same leaf, using the existing cached path result for the leaf as long as the cache results are valid. This saves us from doing expensive b+tree searches in the fs tree of our target inode, as well as other minor work. The following test was run on a non-debug kernel (Debian's default kernel config): $ cat test-with-snapshots.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi umount $DEV &> /dev/null mkfs.btrfs -f $DEV # Use compression to quickly create files with a lot of extents # (each with a size of 128K). mount -o compress=lzo $DEV $MNT # 40G gives 327680 extents, each with a size of 128K. xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar # Add some more files to increase the size of the fs and extent # trees (in the real world there's a lot of files and extents # from other files). xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1 xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2 xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3 # Create a snapshot so all the extents become indirectly shared # through subtrees, with a generation less than or equals to the # generation used to create the snapshot. btrfs subvolume snapshot -r $MNT $MNT/snap1 umount $MNT mount -o compress=lzo $DEV $MNT start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata not cached)" echo start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata cached)" umount $MNT Result before applying this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 1204 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 729 milliseconds (metadata cached) Result after applying this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 732 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 421 milliseconds (metadata cached) That's a -46.1% total reduction for the metadata not cached case, and a -42.2% reduction for the cached metadata case. The test is somewhat limited in the sense the gains may be higher in practice, because in the test the filesystem is small, so we have small fs and extent trees, plus there's no concurrent access to the trees as well, therefore no lock contention there. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 64 +++++++++++++++++++++++++++++++++++++------- fs/btrfs/backref.h | 13 +++++++++ fs/btrfs/extent_io.c | 2 ++ 3 files changed, 69 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 977f07903156..5bf6ec88e670 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -16,8 +16,9 @@ #include "misc.h" #include "tree-mod-log.h" -/* Just an arbitrary number so we can be sure this happened */ -#define BACKREF_FOUND_SHARED 6 +/* Just arbitrary numbers so we can be sure one of these happened. */ +#define BACKREF_FOUND_SHARED 6 +#define BACKREF_FOUND_NOT_SHARED 7 struct extent_inode_elem { u64 inum; @@ -135,7 +136,8 @@ struct preftrees { * - decremented when a ref->count transitions to <1 */ struct share_check { - u64 root_objectid; + struct btrfs_backref_share_check_ctx *ctx; + struct btrfs_root *root; u64 inum; u64 data_bytenr; /* @@ -234,7 +236,7 @@ static void update_share_count(struct share_check *sc, int oldcount, else if (oldcount < 1 && newcount > 0) sc->share_count++; - if (newref->root_id == sc->root_objectid && + if (newref->root_id == sc->root->root_key.objectid && newref->wanted_disk_byte == sc->data_bytenr && newref->key_for_search.objectid == sc->inum) sc->self_ref_count += newref->count; @@ -727,7 +729,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, continue; } - if (sc && ref->root_id != sc->root_objectid) { + if (sc && ref->root_id != sc->root->root_key.objectid) { free_pref(ref); ret = BACKREF_FOUND_SHARED; goto out; @@ -1438,6 +1440,44 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, */ ASSERT(extent_is_shared(sc) == 0); + /* + * If we are here for a data extent and we have a share_check structure + * it means the data extent is not directly shared (does not have + * multiple reference items), so we have to check if a path in the fs + * tree (going from the root node down to the leaf that has the file + * extent item pointing to the data extent) is shared, that is, if any + * of the extent buffers in the path is referenced by other trees. + */ + if (sc && bytenr == sc->data_bytenr) { + /* + * If we are only determining if a data extent is shared or not + * and the corresponding file extent item is located in the same + * leaf as the previous file extent item, we can skip resolving + * indirect references for a data extent, since the fs tree path + * is the same (same leaf, so same path). We skip as long as the + * cached result for the leaf is valid and only if there's only + * one file extent item pointing to the data extent, because in + * the case of multiple file extent items, they may be located + * in different leaves and therefore we have multiple paths. + */ + if (sc->ctx->curr_leaf_bytenr == sc->ctx->prev_leaf_bytenr && + sc->self_ref_count == 1) { + bool cached; + bool is_shared; + + cached = lookup_backref_shared_cache(sc->ctx, sc->root, + sc->ctx->curr_leaf_bytenr, + 0, &is_shared); + if (cached) { + if (is_shared) + ret = BACKREF_FOUND_SHARED; + else + ret = BACKREF_FOUND_NOT_SHARED; + goto out; + } + } + } + btrfs_release_path(path); ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); @@ -1718,7 +1758,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, struct btrfs_seq_list elem = BTRFS_SEQ_LIST_INIT(elem); int ret = 0; struct share_check shared = { - .root_objectid = root->root_key.objectid, + .ctx = ctx, + .root = root, .inum = btrfs_ino(inode), .data_bytenr = bytenr, .share_count = 0, @@ -1755,12 +1796,13 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, &ctx->refs, NULL, NULL, &shared, false); - if (ret == BACKREF_FOUND_SHARED) { - /* this is the only condition under which we return 1 */ - ret = 1; + if (ret == BACKREF_FOUND_SHARED || + ret == BACKREF_FOUND_NOT_SHARED) { + /* If shared must return 1, otherwise return 0. */ + ret = (ret == BACKREF_FOUND_SHARED) ? 1 : 0; if (level >= 0) store_backref_shared_cache(ctx, root, bytenr, - level, true); + level, ret == 1); break; } if (ret < 0 && ret != -ENOENT) @@ -1836,6 +1878,8 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, } out: ulist_release(&ctx->refs); + ctx->prev_leaf_bytenr = ctx->curr_leaf_bytenr; + return ret; } diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index fda78db50be6..6dac462430b0 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -28,6 +28,19 @@ struct btrfs_backref_shared_cache_entry { struct btrfs_backref_share_check_ctx { /* Ulists used during backref walking. */ struct ulist refs; + /* + * The current leaf the caller of btrfs_is_data_extent_shared() is at. + * Typically the caller (at the moment only fiemap) tries to determine + * the sharedness of data extents point by file extent items from entire + * leaves. + */ + u64 curr_leaf_bytenr; + /* + * The previous leaf the caller was at in the previous call to + * btrfs_is_data_extent_shared(). This may be the same as the current + * leaf. On the first call it must be 0. + */ + u64 prev_leaf_bytenr; /* * A path from a root to a leaf that has a file extent item pointing to * a given data extent should never exceed the maximum b+tree height. diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index de62bdf11b33..e7bf750df366 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3969,6 +3969,8 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, if (extent_end <= lockstart) goto next_item; + backref_ctx->curr_leaf_bytenr = leaf->start; + /* We have in implicit hole (NO_HOLES feature enabled). */ if (prev_extent_end < key.offset) { const u64 range_end = min(key.offset, lockend) - 1; From patchwork Mon Oct 10 10:22:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 13002453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73067C433FE for ; Mon, 10 Oct 2022 10:23:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231715AbiJJKXC (ORCPT ); Mon, 10 Oct 2022 06:23:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231893AbiJJKWq (ORCPT ); Mon, 10 Oct 2022 06:22:46 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 628BE6BCC6 for ; Mon, 10 Oct 2022 03:22:40 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3A6AE60ED5 for ; Mon, 10 Oct 2022 10:22:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E597C433C1 for ; Mon, 10 Oct 2022 10:22:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665397359; bh=/kTG2UJKipX4TKaeo8Mdtxdz5XgArFpDJRCrjRGRGLA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=gyGnXg9B+41SKgUia+0dWJJ7s/axuu1Dx3eEtfBjfjUo06oISiiMzMARa3kYsvFjD D99FthJZLAn2I4Psk5RRnGKU7WztkGNXD0oQ1+WAsPf3dhukqFq67djBSYfqgSpTHn GRbfzB5capzpIKdpTVTzhxHOGZomnjPuycqJqx3Jfyq5+7bWUZWJ/N2cBcU477xjlS f16UjvvWaIqarO6TjyppdJ9004oY0oUioOyVs/Pk/LKpSHBj+61oAMAQu+Np2CR1AV LpyaT9PH97SXnOwqnIBZGZZdejLvXjdKU5a7qpR5wAP3h0wy/yQFr/UC4jbg4xksC/ m7mOf/aB0iTmw== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 18/18] btrfs: avoid unnecessary resolution of indirect backrefs during fiemap Date: Mon, 10 Oct 2022 11:22:20 +0100 Message-Id: <5eea9ed9ca371b0ce6b360039b2d02394aa7844f.1665396437.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During fiemap, when determining if a data extent is shared or not, if we don't find the extent is directly shared, then we need to determine if it's shared through subtrees. For that we need to resolve the indirect reference we found in order to figure out the path in the inode's fs tree, which is a path starting at the fs tree's root node and going down to the leaf that contains the file extent item that points to the data extent. We then proceed to determine if any extent buffer in that path is shared with other trees or not. However when the generation of the data extent is more recent than the last generation used to snapshot the root, we don't need to determine the path, since the data extent can not be shared through snapshots. For this case we currently still determine the leaf of that path (at find_parent_nodes(), but then stop determining the other nodes in the path (at btrfs_is_data_extent_shared()) as it's pointless. So do the check of the data extent's generation earlier, at find_parent_nodes(), before trying to resolve the indirect reference to determine the leaf in the path. This saves us from doing one expensive b+tree search in the fs tree of our target inode, as well as other minor work. The following test was run on a non-debug kernel (Debian's default kernel config): $ cat test-fiemap.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi umount $DEV &> /dev/null mkfs.btrfs -f $DEV # Use compression to quickly create files with a lot of extents # (each with a size of 128K). mount -o compress=lzo $DEV $MNT # 40G gives 327680 extents, each with a size of 128K. xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar # Add some more files to increase the size of the fs and extent # trees (in the real world there's a lot of files and extents # from other files). xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1 xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2 xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3 umount $MNT mount -o compress=lzo $DEV $MNT start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata not cached)" echo start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds (metadata cached)" umount $MNT Before applying this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 1285 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 742 milliseconds (metadata cached) After applying this patch: (...) /mnt/sdi/foobar: 327680 extents found fiemap took 689 milliseconds (metadata not cached) /mnt/sdi/foobar: 327680 extents found fiemap took 393 milliseconds (metadata cached) That's a -46.4% total reduction for the metadata not cached case, and a -47.0% reduction for the cached metadata case. The test is somewhat limited in the sense the gains may be higher in practice, because in the test the filesystem is small, so we have small fs and extent trees, plus there's no concurrent access to the trees as well, therefore no lock contention there. Signed-off-by: Filipe Manana --- fs/btrfs/backref.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 5bf6ec88e670..a0f995bb9411 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -140,6 +140,7 @@ struct share_check { struct btrfs_root *root; u64 inum; u64 data_bytenr; + u64 data_extent_gen; /* * Counts number of inodes that refer to an extent (different inodes in * the same root or different roots) that we could find. The sharedness @@ -1449,6 +1450,21 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * of the extent buffers in the path is referenced by other trees. */ if (sc && bytenr == sc->data_bytenr) { + /* + * If our data extent is from a generation more recent than the + * last generation used to snapshot the root, then we know that + * it can not be shared through subtrees, so we can skip + * resolving indirect references, there's no point in + * determining the extent buffers for the path from the fs tree + * root node down to the leaf that has the file extent item that + * points to the data extent. + */ + if (sc->data_extent_gen > + btrfs_root_last_snapshot(&sc->root->root_item)) { + ret = BACKREF_FOUND_NOT_SHARED; + goto out; + } + /* * If we are only determining if a data extent is shared or not * and the corresponding file extent item is located in the same @@ -1762,6 +1778,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, .root = root, .inum = btrfs_ino(inode), .data_bytenr = bytenr, + .data_extent_gen = extent_gen, .share_count = 0, .self_ref_count = 0, }; @@ -1808,17 +1825,6 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr, if (ret < 0 && ret != -ENOENT) break; ret = 0; - /* - * If our data extent is not shared through reflinks and it was - * created in a generation after the last one used to create a - * snapshot of the inode's root, then it can not be shared - * indirectly through subtrees, as that can only happen with - * snapshots. In this case bail out, no need to check for the - * sharedness of extent buffers. - */ - if (level == -1 && - extent_gen > btrfs_root_last_snapshot(&root->root_item)) - break; /* * If our data extent was not directly shared (without multiple