From patchwork Mon Aug 1 13:57:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12933795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C6EC19F2B for ; Mon, 1 Aug 2022 13:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232067AbiHAN6C (ORCPT ); Mon, 1 Aug 2022 09:58:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231352AbiHAN6A (ORCPT ); Mon, 1 Aug 2022 09:58:00 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71ACD13CF1 for ; Mon, 1 Aug 2022 06:57:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EC5FC6126E for ; Mon, 1 Aug 2022 13:57:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D77ACC433D7 for ; Mon, 1 Aug 2022 13:57:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659362278; bh=C5k3MavFlKdVVjILbxt9C0MpxsbVbkWmL136fxrHzQ4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=fy5PwOV+jCbu+S2qKIVGtAtvBYF1BhJntJZZ6685e7dJLpH6VIg5vuwE4chzt7nRi UOE+UYyS7yhMp9sTm5N19mmgRUbwgZWaheZ995HNTV93v4LJPK/QujLgt69qolP1bk 35QkA7dyAFNlauV0qjr2vsWqMCqVDY/PT+jFVAe5gyzVrNyymSRL7AFK6euRjkxk+7 d8Um6Us5j8CKLgy9JxXN4yJQE9yF9eZr12Dtzlk0ibuvbpahjejjP6iL8+I8/HsBjN w1za7pnSpQP5gAgqIW4JqPF3XhVUgJgnTZF30XporQU7tBd+af2X9343JY2mSnXtCe PbuZQJSG9BNNg== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/3] btrfs: fix lost error handling when looking up extended ref on log replay Date: Mon, 1 Aug 2022 14:57:51 +0100 Message-Id: <49766a99f4f0fe3f0757fc8bfaeae6c7343314a6.1659361747.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During log replay, when processing inode references, if we get an error when looking up for an extended reference at __add_inode_ref(), we ignore it and proceed, returning success (0) if no other error happens after the lookup. This is obviously wrong because in case an extended reference exists and it encodes some name not in the log, we need to unlink it, otherwise the filesystem state will not match the state it had after the last fsync. So just make __add_inode_ref() return an error it gets from the extended reference lookup. Fixes: f186373fef005c ("btrfs: extended inode refs") Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 9acf68ef4a49..ccf93ac58e73 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1146,7 +1146,9 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen, inode_objectid, parent_objectid, 0, 0); - if (!IS_ERR_OR_NULL(extref)) { + if (IS_ERR(extref)) { + return PTR_ERR(extref); + } else if (extref) { u32 item_size; u32 cur_offset = 0; unsigned long base; From patchwork Mon Aug 1 13:57:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12933796 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2298DC00144 for ; Mon, 1 Aug 2022 13:58:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231322AbiHAN6D (ORCPT ); Mon, 1 Aug 2022 09:58:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231640AbiHAN6B (ORCPT ); Mon, 1 Aug 2022 09:58:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F5822DE8 for ; Mon, 1 Aug 2022 06:58:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E0BB26131B for ; Mon, 1 Aug 2022 13:57:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA9D5C433D6 for ; Mon, 1 Aug 2022 13:57:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659362279; bh=djzP1n4LjsaCB08LT8QfOKNNx4Eas9PBHdRP6Bwd5iM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=d2fmFPEJMj5zFBfVW80tY4Lwt/Jx1FVsAqsPbZCBK+6tb6ChRpcFZ4Iz8TC8SEsWs ARHbiEa+uCsrSUlHFBARNWJ04ke7VIcqWR1oag/zkDGoHii1A7VjQNW99XCubsmc4e ZiIL02KG8/G0BWhTIyX2n6QnounueZkIO/Sx7eF86CZGvSa5bwQKmEVswExuGbZftH jxruSZgmWok4OaPnS2In+3zhh/i69fw4kRoA8KHXnqYt208I5QDxzdZFf4HGpdXABX gmNTP/yzt1IYXv4mrqF/u1shOvKbwnPNXcMD05ftefi1QbCUtaaldSDyJTlkLwFmt5 oOYXXzik+yM1A== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/3] btrfs: fix warning during log replay when bumping inode link count Date: Mon, 1 Aug 2022 14:57:52 +0100 Message-Id: <55311bcaf0bce109e52a7032bf9148dccff65d10.1659361747.git.fdmanana@suse.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During log replay, at add_link(), we may increment the link count of another inode that has a reference that conflicts with a new reference for the inode currently being processed. During log replay, at add_link(), we may drop (unlink) a reference from some inode in the subvolume tree if that reference conflicts with a new reference found in the log for the inode we are currently processing. After the unlink, If the link count has decreased from 1 to 0, then we increment the link count to prevent the inode from being deleted if it's evicted by an iput() call, because we may have references to add to that inode later on (and we will fixup its link count later during log replay). However incrementing the link count from 0 to 1 triggers a warning: $ cat fs/inode.c (...) void inc_nlink(struct inode *inode) { if (unlikely(inode->i_nlink == 0)) { WARN_ON(!(inode->i_state & I_LINKABLE)); atomic_long_dec(&inode->i_sb->s_remove_count); } (...) The I_LINKABLE flag is only set when creating an O_TMPFILE file, so it's never set during log replay. Most of the time, the warning isn't triggered even if we dropped the last reference of the conflicting inode, and this is because: 1) The conflicting inode was previously marked for fixup, through a call to link_to_fixup_dir(), which increments the inode's link count; 2) And the last iput() on the inode has not triggered eviction of the inode, nor was eviction triggered after the iput(). So at add_link(), even if we unlink the last reference of the inode, its link count ends up being 1 and not 0. So this means that if eviction is triggered after link_to_fixup_dir() is called, at add_link() we will read the inode back from the subvolume tree and have it with a correct link count, matching the number of references it has on the subvolume tree. So if when we are at add_link() the inode has exactly one reference only, its link count is 1, and after the unlink its link count becomes 0. So fix this by using set_nlink() instead of inc_nlink(), as the former accepts a transition from 0 to 1 and it's what we use in other similar contextes (like at link_to_fixup_dir(). Also make add_inode_ref() use set_nlink() instead of inc_nlink() to bump the link count from 0 to 1. The warning is actually harmless, but it may scare users. Josef also ran into it recently. CC: stable@vger.kernel.org # 5.1+ Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index ccf93ac58e73..ff786e712f2b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1459,7 +1459,7 @@ static int add_link(struct btrfs_trans_handle *trans, * on the inode will not free it. We will fixup the link count later. */ if (other_inode->i_nlink == 0) - inc_nlink(other_inode); + set_nlink(other_inode, 1); add_link: ret = btrfs_add_link(trans, BTRFS_I(dir), BTRFS_I(inode), name, namelen, 0, ref_index); @@ -1602,7 +1602,7 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, * free it. We will fixup the link count later. */ if (!ret && inode->i_nlink == 0) - inc_nlink(inode); + set_nlink(inode, 1); } if (ret < 0) goto out; From patchwork Mon Aug 1 13:57:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 12933797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E1D2C19F2A for ; Mon, 1 Aug 2022 13:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232128AbiHAN6E (ORCPT ); Mon, 1 Aug 2022 09:58:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232113AbiHAN6C (ORCPT ); Mon, 1 Aug 2022 09:58:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6660013CF1 for ; Mon, 1 Aug 2022 06:58:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E63AD61278 for ; Mon, 1 Aug 2022 13:58:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CCC80C433D7 for ; Mon, 1 Aug 2022 13:57:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659362280; bh=74gZrIuhOm1HYR1n6XfxBr7oe5O/RXpSdFZV3ask5jc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=UktZnHrewDR1P04naJ/HUnn43bwBc7+O+qzJLc26pNaSUwORfVlEEQpILOR08P7wN IsTuvryyeJucewCzGP2+cIbUASSmtXGshLu8TG43lB1SGmJxL+i97KlncFaJrsFQW7 dJsM7SAiQ8BwfcZdeYN8ezAHRm1wfeROFVyNkkPDCqGTKzH85IOokWWnkWMbI9qY5V N3mf/NkPo4cq4x77nbpK68Gf8FROfgolHrpRd2KdjAg6OJKxzyZ+xnQvLv2bIx6Plc 06ePhfRwcLlYc1tqD9zhuweggu03zPX0ul+Q7zpH70qDi1+1srJ1+tVCiiWl5iCkRg mBxePGeftAXTQ== From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/3] btrfs: simplify adding and replacing references during log replay Date: Mon, 1 Aug 2022 14:57:53 +0100 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana During log replay, when adding/replacing inode references, there are two special cases that have special code for them: 1) When we have an inode with two or more hardlinks in the same directory, therefore two or more names incoded in the same inode reference item, and one of the hard links gets renamed to the old name of another hard link - that is, the index number for a name changes. This was added in commit 0d836392cadd55 ("Btrfs: fix mount failure after fsync due to hard link recreation"), and is covered by test case generic/502 from fstests; 2) When we have several inodes that got renamed to an old name of some other inode, in a cascading style. The code to deal with this special case was added in commit 6b5fc433a7ad67 ("Btrfs: fix fsync after succession of renames of different files"), and is covered by test cases generic/526 and generic/527 from fstests. Both cases can be deal with by making sure __add_inode_ref() is always called by add_inode_ref() for every name encoded in the inode reference item, and not just for the first name that has a conflict. With such change we no longer need that special casing for the two cases mentioned before. So do those changes. Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 164 ++++---------------------------------------- 1 file changed, 13 insertions(+), 151 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index ff786e712f2b..e3db3cbdb97b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1063,8 +1063,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_inode *dir, struct btrfs_inode *inode, u64 inode_objectid, u64 parent_objectid, - u64 ref_index, char *name, int namelen, - int *search_done) + u64 ref_index, char *name, int namelen) { int ret; char *victim_name; @@ -1126,19 +1125,12 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, kfree(victim_name); if (ret) return ret; - *search_done = 1; goto again; } kfree(victim_name); ptr = (unsigned long)(victim_ref + 1) + victim_name_len; } - - /* - * NOTE: we have searched root tree and checked the - * corresponding ref, it does not need to check again. - */ - *search_done = 1; } btrfs_release_path(path); @@ -1202,14 +1194,12 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, kfree(victim_name); if (ret) return ret; - *search_done = 1; goto again; } kfree(victim_name); next: cur_offset += victim_name_len + sizeof(*extref); } - *search_done = 1; } btrfs_release_path(path); @@ -1373,103 +1363,6 @@ static int unlink_old_inode_refs(struct btrfs_trans_handle *trans, return ret; } -static int btrfs_inode_ref_exists(struct inode *inode, struct inode *dir, - const u8 ref_type, const char *name, - const int namelen) -{ - struct btrfs_key key; - struct btrfs_path *path; - const u64 parent_id = btrfs_ino(BTRFS_I(dir)); - int ret; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - key.objectid = btrfs_ino(BTRFS_I(inode)); - key.type = ref_type; - if (key.type == BTRFS_INODE_REF_KEY) - key.offset = parent_id; - else - key.offset = btrfs_extref_hash(parent_id, name, namelen); - - ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path, 0, 0); - if (ret < 0) - goto out; - if (ret > 0) { - ret = 0; - goto out; - } - if (key.type == BTRFS_INODE_EXTREF_KEY) - ret = !!btrfs_find_name_in_ext_backref(path->nodes[0], - path->slots[0], parent_id, name, namelen); - else - ret = !!btrfs_find_name_in_backref(path->nodes[0], path->slots[0], - name, namelen); - -out: - btrfs_free_path(path); - return ret; -} - -static int add_link(struct btrfs_trans_handle *trans, - struct inode *dir, struct inode *inode, const char *name, - int namelen, u64 ref_index) -{ - struct btrfs_root *root = BTRFS_I(dir)->root; - struct btrfs_dir_item *dir_item; - struct btrfs_key key; - struct btrfs_path *path; - struct inode *other_inode = NULL; - int ret; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - dir_item = btrfs_lookup_dir_item(NULL, root, path, - btrfs_ino(BTRFS_I(dir)), - name, namelen, 0); - if (!dir_item) { - btrfs_release_path(path); - goto add_link; - } else if (IS_ERR(dir_item)) { - ret = PTR_ERR(dir_item); - goto out; - } - - /* - * Our inode's dentry collides with the dentry of another inode which is - * in the log but not yet processed since it has a higher inode number. - * So delete that other dentry. - */ - btrfs_dir_item_key_to_cpu(path->nodes[0], dir_item, &key); - btrfs_release_path(path); - other_inode = read_one_inode(root, key.objectid); - if (!other_inode) { - ret = -ENOENT; - goto out; - } - ret = unlink_inode_for_log_replay(trans, BTRFS_I(dir), BTRFS_I(other_inode), - name, namelen); - if (ret) - goto out; - /* - * If we dropped the link count to 0, bump it so that later the iput() - * on the inode will not free it. We will fixup the link count later. - */ - if (other_inode->i_nlink == 0) - set_nlink(other_inode, 1); -add_link: - ret = btrfs_add_link(trans, BTRFS_I(dir), BTRFS_I(inode), - name, namelen, 0, ref_index); -out: - iput(other_inode); - btrfs_free_path(path); - - return ret; -} - /* * replay one inode back reference item found in the log tree. * eb, slot and key refer to the buffer and key found in the log tree. @@ -1490,7 +1383,6 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, char *name = NULL; int namelen; int ret; - int search_done = 0; int log_ref_ver = 0; u64 parent_objectid; u64 inode_objectid; @@ -1565,51 +1457,21 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, * overwrite any existing back reference, and we don't * want to create dangling pointers in the directory. */ - - if (!search_done) { - ret = __add_inode_ref(trans, root, path, log, - BTRFS_I(dir), - BTRFS_I(inode), - inode_objectid, - parent_objectid, - ref_index, name, namelen, - &search_done); - if (ret) { - if (ret == 1) - ret = 0; - goto out; - } - } - - /* - * If a reference item already exists for this inode - * with the same parent and name, but different index, - * drop it and the corresponding directory index entries - * from the parent before adding the new reference item - * and dir index entries, otherwise we would fail with - * -EEXIST returned from btrfs_add_link() below. - */ - ret = btrfs_inode_ref_exists(inode, dir, key->type, - name, namelen); - if (ret > 0) { - ret = unlink_inode_for_log_replay(trans, - BTRFS_I(dir), - BTRFS_I(inode), - name, namelen); - /* - * If we dropped the link count to 0, bump it so - * that later the iput() on the inode will not - * free it. We will fixup the link count later. - */ - if (!ret && inode->i_nlink == 0) - set_nlink(inode, 1); - } - if (ret < 0) + ret = __add_inode_ref(trans, root, path, log, + BTRFS_I(dir), + BTRFS_I(inode), + inode_objectid, + parent_objectid, + ref_index, name, namelen); + if (ret) { + if (ret == 1) + ret = 0; goto out; + } /* insert our name */ - ret = add_link(trans, dir, inode, name, namelen, - ref_index); + ret = btrfs_add_link(trans, BTRFS_I(dir), BTRFS_I(inode), + name, namelen, 0, ref_index); if (ret) goto out;