Message ID | 029df99dabfee5b8fc602bf284bb3ea364176418.1646009185.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: add check and repair ability for super num devices mismatch | expand |
On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote: > [BUG] > There is a bug report of kernel rejecting fs which has a mismatch in > super num devices and num devices found in chunk tree. > > But btrfs-check reports no problem about the fs. > > [CAUSE] > We just didn't verify super num devices against the result found in > chunk tree. > > [FIX] > Add such check and repair ability for btrfs-check. > > The ability is mode independent. > > Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ > Signed-off-by: Qu Wenruo <wqu@suse.com> With this patch applied there's a new test failure: === START TEST .../tests/fsck-tests/014-no-extent-info testing image no_extent.raw.restored ====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored Opening filesystem to check... Checking filesystem on ./no_extent.raw.restored UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f cache and super generation don't match, space cache will be invalidated found 180224 bytes used, no error found total csum bytes: 0 total tree bytes: 180224 total fs tree bytes: 81920 total extent tree bytes: 16384 btree space waste bytes: 138540 file data blocks allocated: 0 referenced 0 succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored unexpected success: btrfs check should have detected corruption
On 2022/3/24 01:43, David Sterba wrote: > On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote: >> [BUG] >> There is a bug report of kernel rejecting fs which has a mismatch in >> super num devices and num devices found in chunk tree. >> >> But btrfs-check reports no problem about the fs. >> >> [CAUSE] >> We just didn't verify super num devices against the result found in >> chunk tree. >> >> [FIX] >> Add such check and repair ability for btrfs-check. >> >> The ability is mode independent. >> >> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com> >> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > With this patch applied there's a new test failure: > > === START TEST .../tests/fsck-tests/014-no-extent-info > testing image no_extent.raw.restored > ====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored > Opening filesystem to check... > Checking filesystem on ./no_extent.raw.restored > UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f > cache and super generation don't match, space cache will be invalidated > found 180224 bytes used, no error found > total csum bytes: 0 > total tree bytes: 180224 > total fs tree bytes: 81920 > total extent tree bytes: 16384 > btree space waste bytes: 138540 > file data blocks allocated: 0 > referenced 0 > succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored > unexpected success: btrfs check should have detected corruption Weird, the problem is, for the restored image, if I run ./btrfs check manually it can detects the problem, but still go "no error found" path. So it must be my patch overriding the return value. Will fix it soon. Thanks, Qu
On 2022/3/24 07:15, Qu Wenruo wrote: > > > On 2022/3/24 01:43, David Sterba wrote: >> On Mon, Feb 28, 2022 at 08:50:07AM +0800, Qu Wenruo wrote: >>> [BUG] >>> There is a bug report of kernel rejecting fs which has a mismatch in >>> super num devices and num devices found in chunk tree. >>> >>> But btrfs-check reports no problem about the fs. >>> >>> [CAUSE] >>> We just didn't verify super num devices against the result found in >>> chunk tree. >>> >>> [FIX] >>> Add such check and repair ability for btrfs-check. >>> >>> The ability is mode independent. >>> >>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com> >>> Link: >>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >> >> With this patch applied there's a new test failure: >> >> === START TEST .../tests/fsck-tests/014-no-extent-info >> testing image no_extent.raw.restored >> ====== RUN MUSTFAIL .../btrfs check ./no_extent.raw.restored >> Opening filesystem to check... >> Checking filesystem on ./no_extent.raw.restored >> UUID: aeee7297-37a4-4547-a8a9-7b870d58d31f >> cache and super generation don't match, space cache will be invalidated >> found 180224 bytes used, no error found >> total csum bytes: 0 >> total tree bytes: 180224 >> total fs tree bytes: 81920 >> total extent tree bytes: 16384 >> btree space waste bytes: 138540 >> file data blocks allocated: 0 >> referenced 0 >> succeeded (unexpected!): .../btrfs check ./no_extent.raw.restored >> unexpected success: btrfs check should have detected corruption > > > Weird, the problem is, for the restored image, if I run ./btrfs check > manually it can detects the problem, but still go "no error found" path. > > So it must be my patch overriding the return value. Indeed that's the case. Fix upon current devel branch is: https://lore.kernel.org/linux-btrfs/f576d7a548b91d42d02b17d2dc52ee04d943a57d.1648077794.git.wqu@suse.com/T/#u Please fold the fix into the patch. Thanks, Qu > > Will fix it soon. > > Thanks, > Qu
diff --git a/check/main.c b/check/main.c index 8ccba4478de8..b29f6266b974 100644 --- a/check/main.c +++ b/check/main.c @@ -9140,6 +9140,7 @@ static int do_check_chunks_and_extents(void) if (ret > 0) ret = 0; } + ret = check_and_repair_super_num_devs(gfs_info); return ret; } diff --git a/check/mode-common.c b/check/mode-common.c index c3d8bb45c6b6..a2c6c7732f4e 100644 --- a/check/mode-common.c +++ b/check/mode-common.c @@ -1583,3 +1583,91 @@ int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree) } return ret; } + +static int get_num_devs_in_chunk_tree(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *chunk_root = fs_info->chunk_root; + struct btrfs_path path = {0}; + struct btrfs_key key = {0}; + int found_devs = 0; + int ret; + + ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0); + if (ret < 0) + return ret; + + /* We should be the first slot, and chunk tree should not be empty*/ + ASSERT(path.slots[0] == 0 && btrfs_header_nritems(path.nodes[0])); + + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + + while (key.objectid == BTRFS_DEV_ITEMS_OBJECTID && + key.type == BTRFS_DEV_ITEM_KEY) { + found_devs++; + + ret = btrfs_next_item(chunk_root, &path); + if (ret < 0) + break; + + /* + * This should not happen, as we should have CHUNK items after + * dev items, but since we're only to get the num devices, + * no need to bother the problem. + */ + if (ret > 0) { + ret = 0; + break; + } + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + } + btrfs_release_path(&path); + if (ret < 0) + return ret; + return found_devs; +} + +int check_and_repair_super_num_devs(struct btrfs_fs_info *fs_info) +{ + struct btrfs_trans_handle *trans; + int found_devs; + int ret; + + ret = get_num_devs_in_chunk_tree(fs_info); + if (ret < 0) + return ret; + + found_devs = ret; + + if (found_devs == btrfs_super_num_devices(fs_info->super_copy)) + return 0; + + /* Now the found devs in chunk tree mismatch with super block*/ + error("super num devices mismatch, have %llu expect %u", + btrfs_super_num_devices(fs_info->super_copy), + found_devs); + + if (!repair) + return -EUCLEAN; + + /* + * Repair is pretty simple, just reset the super block value and + * commit a new transaction. + */ + trans = btrfs_start_transaction(fs_info->tree_root, 0); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start trans: %m"); + return ret; + } + btrfs_set_super_num_devices(fs_info->super_copy, found_devs); + ret = btrfs_commit_transaction(trans, fs_info->tree_root); + if (ret < 0) { + errno = -ret; + error("failed to commit trans: %m"); + btrfs_abort_transaction(trans, ret); + return ret; + } + printf("Successfully reset super num devices to %u\n", found_devs); + return 0; +} diff --git a/check/mode-common.h b/check/mode-common.h index b5e6b727fe73..d5bab85a4f5e 100644 --- a/check/mode-common.h +++ b/check/mode-common.h @@ -201,4 +201,6 @@ int repair_dev_item_bytes_used(struct btrfs_fs_info *fs_info, int fill_csum_tree(struct btrfs_trans_handle *trans, bool search_fs_tree); +int check_and_repair_super_num_devs(struct btrfs_fs_info *fs_info); + #endif
[BUG] There is a bug report of kernel rejecting fs which has a mismatch in super num devices and num devices found in chunk tree. But btrfs-check reports no problem about the fs. [CAUSE] We just didn't verify super num devices against the result found in chunk tree. [FIX] Add such check and repair ability for btrfs-check. The ability is mode independent. Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- check/main.c | 1 + check/mode-common.c | 88 +++++++++++++++++++++++++++++++++++++++++++++ check/mode-common.h | 2 ++ 3 files changed, 91 insertions(+)