diff mbox series

[2/2] btrfs: rescue/zero-log: Manually write all supers to handle extent tree error more gracefully

Message ID 20191111075059.30352-2-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] btrfs-progs: Reduce error level from error to warning for OPEN_CTREE_PARTIAL | expand

Commit Message

Qu Wenruo Nov. 11, 2019, 7:50 a.m. UTC
[BUG]
Even "btrfs rescue zero-log" only reset btrfs_super_block::log_root and
btrfs_super_block::log_root_level, we still use trasction to write all
super blocks for all devices.

This means we can't handle things like corrupted extent tree:

  checksum verify failed on 2172747776 found 000000B6 wanted 00000000
  checksum verify failed on 2172747776 found 000000B6 wanted 00000000
  bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
  WARNING: could not setup extent tree, skipping it
  Clearing log on /dev/nvme/btrfs, previous log_root 0, level 0
  ERROR: Corrupted fs, no valid METADATA block group found
  ERROR: attempt to start transaction over already running one

[CAUSE]
Because we have extra check in transaction code to ensure we have valid
METADATA block groups.

In fact we don't really need transaction at all.

[FIX]
Instead of commit transaction, we can just call write_all_supers()
manually, so we can still handle multi-device fs while avoid above
error.

Also, add OPEN_CTREE_NO_BLOCK_GROUPS open ctree flag to make it more
robust.

Reported-by: Christian Pernegger <pernegger@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 cmds/rescue.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

David Sterba Nov. 15, 2019, 11:40 a.m. UTC | #1
On Mon, Nov 11, 2019 at 03:50:59PM +0800, Qu Wenruo wrote:
> [BUG]
> Even "btrfs rescue zero-log" only reset btrfs_super_block::log_root and
> btrfs_super_block::log_root_level, we still use trasction to write all
> super blocks for all devices.
> 
> This means we can't handle things like corrupted extent tree:
> 
>   checksum verify failed on 2172747776 found 000000B6 wanted 00000000
>   checksum verify failed on 2172747776 found 000000B6 wanted 00000000
>   bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
>   WARNING: could not setup extent tree, skipping it
>   Clearing log on /dev/nvme/btrfs, previous log_root 0, level 0
>   ERROR: Corrupted fs, no valid METADATA block group found
>   ERROR: attempt to start transaction over already running one
> 
> [CAUSE]
> Because we have extra check in transaction code to ensure we have valid
> METADATA block groups.
> 
> In fact we don't really need transaction at all.
> 
> [FIX]
> Instead of commit transaction, we can just call write_all_supers()
> manually, so we can still handle multi-device fs while avoid above
> error.
> 
> Also, add OPEN_CTREE_NO_BLOCK_GROUPS open ctree flag to make it more
> robust.
> 
> Reported-by: Christian Pernegger <pernegger@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks, v1 has been replaced.
diff mbox series

Patch

diff --git a/cmds/rescue.c b/cmds/rescue.c
index e8eab6808bc3..087c33befeff 100644
--- a/cmds/rescue.c
+++ b/cmds/rescue.c
@@ -165,7 +165,6 @@  static int cmd_rescue_zero_log(const struct cmd_struct *cmd,
 			       int argc, char **argv)
 {
 	struct btrfs_root *root;
-	struct btrfs_trans_handle *trans;
 	struct btrfs_super_block *sb;
 	char *devname;
 	int ret;
@@ -187,7 +186,8 @@  static int cmd_rescue_zero_log(const struct cmd_struct *cmd,
 		goto out;
 	}
 
-	root = open_ctree(devname, 0, OPEN_CTREE_WRITES | OPEN_CTREE_PARTIAL);
+	root = open_ctree(devname, 0, OPEN_CTREE_WRITES | OPEN_CTREE_PARTIAL |
+			  OPEN_CTREE_NO_BLOCK_GROUPS);
 	if (!root) {
 		error("could not open ctree");
 		return 1;
@@ -198,13 +198,14 @@  static int cmd_rescue_zero_log(const struct cmd_struct *cmd,
 			devname,
 			(unsigned long long)btrfs_super_log_root(sb),
 			(unsigned)btrfs_super_log_root_level(sb));
-	trans = btrfs_start_transaction(root, 1);
-	BUG_ON(IS_ERR(trans));
 	btrfs_set_super_log_root(sb, 0);
 	btrfs_set_super_log_root_level(sb, 0);
-	btrfs_commit_transaction(trans, root);
+	ret = write_all_supers(root->fs_info);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to write dev supers: %m");
+	}
 	close_ctree(root);
-
 out:
 	return !!ret;
 }