diff mbox series

[1/2] btrfs-progs: check: add check and repair ability for super num devs mismatch

Message ID 029df99dabfee5b8fc602bf284bb3ea364176418.1646009185.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: add check and repair ability for super num devices mismatch | expand

Commit Message

Qu Wenruo Feb. 28, 2022, 12:50 a.m. UTC
[BUG]
There is a bug report of kernel rejecting fs which has a mismatch in
super num devices and num devices found in chunk tree.

But btrfs-check reports no problem about the fs.

[CAUSE]
We just didn't verify super num devices against the result found in
chunk tree.

[FIX]
Add such check and repair ability for btrfs-check.

The ability is mode independent.

Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/main.c        |  1 +
 check/mode-common.c | 88 +++++++++++++++++++++++++++++++++++++++++++++
 check/mode-common.h |  2 ++
 3 files changed, 91 insertions(+)

Comments

David Sterba March 23, 2022, 5:43 p.m. UTC | #1
On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote:
> [BUG]
> There is a bug report of kernel rejecting fs which has a mismatch in
> super num devices and num devices found in chunk tree.
> 
> But btrfs-check reports no problem about the fs.
> 
> [CAUSE]
> We just didn't verify super num devices against the result found in
> chunk tree.
> 
> [FIX]
> Add such check and repair ability for btrfs-check.
> 
> The ability is mode independent.
> 
> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

With this patch applied there's a new test failure:

=== START TEST .../tests/fsck-tests/014-no-extent-info
testing image no_extent.raw.restored
====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored
Opening filesystem to check...
Checking filesystem on ./no_extent.raw.restored
UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f
cache and super generation don't match, space cache will be invalidated
found 180224 bytes used, no error found
total csum bytes: 0
total tree bytes: 180224
total fs tree bytes: 81920
total extent tree bytes: 16384
btree space waste bytes: 138540
file data blocks allocated: 0
 referenced 0
succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored
unexpected success: btrfs check should have detected corruption
Qu Wenruo March 23, 2022, 11:15 p.m. UTC | #2
On 2022/3/24 01:43, David Sterba wrote:
> On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is a bug report of kernel rejecting fs which has a mismatch in
>> super num devices and num devices found in chunk tree.
>>
>> But btrfs-check reports no problem about the fs.
>>
>> [CAUSE]
>> We just didn't verify super num devices against the result found in
>> chunk tree.
>>
>> [FIX]
>> Add such check and repair ability for btrfs-check.
>>
>> The ability is mode independent.
>>
>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> With this patch applied there's a new test failure:
>
> === START TEST .../tests/fsck-tests/014-no-extent-info
> testing image no_extent.raw.restored
> ====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored
> Opening filesystem to check...
> Checking filesystem on ./no_extent.raw.restored
> UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f
> cache and super generation don't match, space cache will be invalidated
> found 180224 bytes used, no error found
> total csum bytes: 0
> total tree bytes: 180224
> total fs tree bytes: 81920
> total extent tree bytes: 16384
> btree space waste bytes: 138540
> file data blocks allocated: 0
>   referenced 0
> succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored
> unexpected success: btrfs check should have detected corruption


Weird, the problem is, for the restored image, if I run ./btrfs check
manually it can detects the problem, but still go "no error found" path.

So it must be my patch overriding the return value.

Will fix it soon.

Thanks,
Qu
Qu Wenruo March 23, 2022, 11:38 p.m. UTC | #3
On 2022/3/24 07:15, Qu Wenruo wrote:
>
>
> On 2022/3/24 01:43, David Sterba wrote:
>> On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote:
>>> [BUG]
>>> There is a bug report of kernel rejecting fs which has a mismatch in
>>> super num devices and num devices found in chunk tree.
>>>
>>> But btrfs-check reports no problem about the fs.
>>>
>>> [CAUSE]
>>> We just didn't verify super num devices against the result found in
>>> chunk tree.
>>>
>>> [FIX]
>>> Add such check and repair ability for btrfs-check.
>>>
>>> The ability is mode independent.
>>>
>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>> Link:
>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> With this patch applied there's a new test failure:
>>
>> === START TEST .../tests/fsck-tests/014-no-extent-info
>> testing image no_extent.raw.restored
>> ====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored
>> Opening filesystem to check...
>> Checking filesystem on ./no_extent.raw.restored
>> UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f
>> cache and super generation don't match, space cache will be invalidated
>> found 180224 bytes used, no error found
>> total csum bytes: 0
>> total tree bytes: 180224
>> total fs tree bytes: 81920
>> total extent tree bytes: 16384
>> btree space waste bytes: 138540
>> file data blocks allocated: 0
>>   referenced 0
>> succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored
>> unexpected success: btrfs check should have detected corruption
>
>
> Weird, the problem is, for the restored image, if I run ./btrfs check
> manually it can detects the problem, but still go "no error found" path.
>
> So it must be my patch overriding the return value.

Indeed that's the case.

Fix upon current devel branch is:
https://lore.kernel.org/linux-btrfs/f576d7a548b91d42d02b17d2dc52ee04d943a57d.1648077794.git.wqu@suse.com/T/#u

Please fold the fix into the patch.

Thanks,
Qu
>
> Will fix it soon.
>
> Thanks,
> Qu
diff mbox series

Patch

diff --git a/check/main.c b/check/main.c
index 8ccba4478de8..b29f6266b974 100644
--- a/check/main.c
+++ b/check/main.c
@@ -9140,6 +9140,7 @@  static int do_check_chunks_and_extents(void)
 		if (ret > 0)
 			ret = 0;
 	}
+	ret = check_and_repair_super_num_devs(gfs_info);
 	return ret;
 }
 
diff --git a/check/mode-common.c b/check/mode-common.c
index c3d8bb45c6b6..a2c6c7732f4e 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -1583,3 +1583,91 @@  int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree)
 	}
 	return ret;
 }
+
+static int get_num_devs_in_chunk_tree(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *chunk_root = fs_info->chunk_root;
+	struct btrfs_path path = {0};
+	struct btrfs_key key = {0};
+	int found_devs = 0;
+	int ret;
+
+	ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0);
+	if (ret < 0)
+		return ret;
+
+	/* We should be the first slot, and chunk tree should not be empty*/
+	ASSERT(path.slots[0] == 0 && btrfs_header_nritems(path.nodes[0]));
+
+	btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+
+	while (key.objectid == BTRFS_DEV_ITEMS_OBJECTID &&
+	       key.type == BTRFS_DEV_ITEM_KEY) {
+		found_devs++;
+
+		ret = btrfs_next_item(chunk_root, &path);
+		if (ret < 0)
+			break;
+
+		/*
+		 * This should not happen, as we should have CHUNK items after
+		 * dev items, but since we're only to get the num devices,
+		 * no need to bother the problem.
+		 */
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+	}
+	btrfs_release_path(&path);
+	if (ret < 0)
+		return ret;
+	return found_devs;
+}
+
+int check_and_repair_super_num_devs(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_trans_handle *trans;
+	int found_devs;
+	int ret;
+
+	ret = get_num_devs_in_chunk_tree(fs_info);
+	if (ret < 0)
+		return ret;
+
+	found_devs = ret;
+
+	if (found_devs == btrfs_super_num_devices(fs_info->super_copy))
+		return 0;
+
+	/* Now the found devs in chunk tree mismatch with super block*/
+	error("super num devices mismatch, have %llu expect %u",
+	      btrfs_super_num_devices(fs_info->super_copy),
+	      found_devs);
+
+	if (!repair)
+		return -EUCLEAN;
+
+	/*
+	 * Repair is pretty simple, just reset the super block value and
+	 * commit a new transaction.
+	 */
+	trans = btrfs_start_transaction(fs_info->tree_root, 0);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start trans: %m");
+		return ret;
+	}
+	btrfs_set_super_num_devices(fs_info->super_copy, found_devs);
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit trans: %m");
+		btrfs_abort_transaction(trans, ret);
+		return ret;
+	}
+	printf("Successfully reset super num devices to %u\n", found_devs);
+	return 0;
+}
diff --git a/check/mode-common.h b/check/mode-common.h
index b5e6b727fe73..d5bab85a4f5e 100644
--- a/check/mode-common.h
+++ b/check/mode-common.h
@@ -201,4 +201,6 @@  int repair_dev_item_bytes_used(struct btrfs_fs_info *fs_info,
 
 int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree);
 
+int check_and_repair_super_num_devs(struct btrfs_fs_info *fs_info);
+
 #endif