[v2,1/2] btrfs-progs: warn users about the possible dangers of check --repair
diff mbox series

Message ID 20191018111604.16463-1-jthumshirn@suse.de
State New
Headers show
Series
  • [v2,1/2] btrfs-progs: warn users about the possible dangers of check --repair
Related show

Commit Message

Johannes Thumshirn Oct. 18, 2019, 11:16 a.m. UTC
The manual page of btrfsck clearly states 'btrfs check --repair' is a
dangerous operation.

Although this warning is in place users do not read the manual page and/or
are used to the behaviour of fsck utilities which repair the filesystem,
and thus potentially cause harm.

Similar to 'btrfs balance' without any filters, add a warning and a
countdown, so users can bail out before eventual corrupting the filesystem
more than it already is.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

---
Changes to v1:
- Fix grammar mistakes in warning message
- Skip delay with --force
- Adjust tests to cope with btrfsck --repair --force
---
 check/main.c                                       | 23 ++++++++++++++++------
 tests/cli-tests/007-check-force/test.sh            |  2 --
 tests/fsck-tests/013-extent-tree-rebuild/test.sh   |  2 +-
 tests/fsck-tests/032-corrupted-qgroup/test.sh      |  2 +-
 tests/fuzz-tests/003-multi-check-unmounted/test.sh |  2 +-
 5 files changed, 20 insertions(+), 11 deletions(-)

Comments

David Sterba Oct. 21, 2019, 3:22 p.m. UTC | #1
On Fri, Oct 18, 2019 at 01:16:03PM +0200, Johannes Thumshirn wrote:
> The manual page of btrfsck clearly states 'btrfs check --repair' is a
> dangerous operation.
> 
> Although this warning is in place users do not read the manual page and/or
> are used to the behaviour of fsck utilities which repair the filesystem,
> and thus potentially cause harm.
> 
> Similar to 'btrfs balance' without any filters, add a warning and a
> countdown, so users can bail out before eventual corrupting the filesystem
> more than it already is.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> 
> ---
> Changes to v1:
> - Fix grammar mistakes in warning message
> - Skip delay with --force

--force was added for a different reason, to allow check on a mounted
filesystem. I don't think that combining --repair and --force just to
allow repair is a good idea. There's a 'dangerous repair' mode for eg.
xfs that allows to do live surgery on a mounted filesytem (followed by
immediate reboot). We want to be able to do that eventually.

I understand where the motivation comes from, let me have a second
thought on that.
Johannes Thumshirn Oct. 22, 2019, 7:33 a.m. UTC | #2
On 21/10/2019 17:22, David Sterba wrote:
> --force was added for a different reason, to allow check on a mounted
> filesystem. I don't think that combining --repair and --force just to
> allow repair is a good idea. There's a 'dangerous repair' mode for eg.
> xfs that allows to do live surgery on a mounted filesytem (followed by
> immediate reboot). We want to be able to do that eventually.
> 
> I understand where the motivation comes from, let me have a second
> thought on that.

So how about adding a '--yes' or '--accept', '--dangerous',
'--allow-dangeruos' parameter instead of force to skip the warning?

My vote would go for '--allow-dangerous'.

Byte,
	Johannes
Qu Wenruo Oct. 22, 2019, 7:37 a.m. UTC | #3
On 2019/10/22 下午3:33, Johannes Thumshirn wrote:
> On 21/10/2019 17:22, David Sterba wrote:
>> --force was added for a different reason, to allow check on a mounted
>> filesystem. I don't think that combining --repair and --force just to
>> allow repair is a good idea. There's a 'dangerous repair' mode for eg.
>> xfs that allows to do live surgery on a mounted filesytem (followed by
>> immediate reboot). We want to be able to do that eventually.
>>
>> I understand where the motivation comes from, let me have a second
>> thought on that.
>
> So how about adding a '--yes' or '--accept', '--dangerous',
> '--allow-dangeruos' parameter instead of force to skip the warning?
>
> My vote would go for '--allow-dangerous'.

+1 for '--yes', at least e2fsck has a similar '-y' option.

Thanks,
Qu

>
> Byte,
> 	Johannes
>
Johannes Thumshirn Oct. 22, 2019, 7:45 a.m. UTC | #4
On 22/10/2019 09:37, Qu Wenruo wrote:
> +1 for '--yes', at least e2fsck has a similar '-y' option.

OK, for me as well. And yes being in line with e2fsck has it's benefits
as well.

Byte,
	Johannes
Nikolay Borisov Oct. 22, 2019, 7:50 a.m. UTC | #5
On 22.10.19 г. 10:37 ч., Qu Wenruo wrote:
> 
> 
> On 2019/10/22 下午3:33, Johannes Thumshirn wrote:
>> On 21/10/2019 17:22, David Sterba wrote:
>>> --force was added for a different reason, to allow check on a mounted
>>> filesystem. I don't think that combining --repair and --force just to
>>> allow repair is a good idea. There's a 'dangerous repair' mode for eg.
>>> xfs that allows to do live surgery on a mounted filesytem (followed by
>>> immediate reboot). We want to be able to do that eventually.
>>>
>>> I understand where the motivation comes from, let me have a second
>>> thought on that.
>>
>> So how about adding a '--yes' or '--accept', '--dangerous',
>> '--allow-dangeruos' parameter instead of force to skip the warning?
>>
>> My vote would go for '--allow-dangerous'.
> 
> +1 for '--yes', at least e2fsck has a similar '-y' option.

I'\m fine with --yes/-y. It's not just e2fsck but I've seen similar
options in other tools as well.

> 
> Thanks,
> Qu
> 
>>
>> Byte,
>> 	Johannes
>>
David Sterba Oct. 22, 2019, 12:19 p.m. UTC | #6
On Tue, Oct 22, 2019 at 09:33:06AM +0200, Johannes Thumshirn wrote:
> On 21/10/2019 17:22, David Sterba wrote:
> > --force was added for a different reason, to allow check on a mounted
> > filesystem. I don't think that combining --repair and --force just to
> > allow repair is a good idea. There's a 'dangerous repair' mode for eg.
> > xfs that allows to do live surgery on a mounted filesytem (followed by
> > immediate reboot). We want to be able to do that eventually.
> > 
> > I understand where the motivation comes from, let me have a second
> > thought on that.
> 
> So how about adding a '--yes' or '--accept', '--dangerous',
> '--allow-dangeruos' parameter instead of force to skip the warning?
> 
> My vote would go for '--allow-dangerous'.

So, I agree with the above. The dangerous repair should be something
almost nobody does or should do, so a very long option name is just
fine. This leaves -f for --repair to skip the warning. We now have:

* btrfs check - read-only by default, no changes

* btrfs check --read-only - same as above, explicit about RO

* btrfs check --repair - warning with a timeout, then repair

* btrfs check --repair -f - no warning (or the warning could be still
                            printed but without timeout)

I'd rather avoid options that would be confusing to what are they
referring to. So '--yes' it's like don't ask questions before repairing,
that's what e2fsck does but that's different from the initial warning.
And so on.

The dangerous repair would need a full set of the options, so

* btrfs --repair -f --allow-dangerous
David Sterba Nov. 15, 2019, 12:53 p.m. UTC | #7
On Fri, Oct 18, 2019 at 01:16:03PM +0200, Johannes Thumshirn wrote:
> The manual page of btrfsck clearly states 'btrfs check --repair' is a
> dangerous operation.
> 
> Although this warning is in place users do not read the manual page and/or
> are used to the behaviour of fsck utilities which repair the filesystem,
> and thus potentially cause harm.
> 
> Similar to 'btrfs balance' without any filters, add a warning and a
> countdown, so users can bail out before eventual corrupting the filesystem
> more than it already is.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

For the record, this is on the way to 5.4. Thanks.

Patch
diff mbox series

diff --git a/check/main.c b/check/main.c
index fd05430c1f51..1fecfc37c135 100644
--- a/check/main.c
+++ b/check/main.c
@@ -9970,6 +9970,23 @@  static int cmd_check(const struct cmd_struct *cmd, int argc, char **argv)
 		exit(1);
 	}
 
+	if (repair && !force) {
+		int delay = 10;
+		printf("WARNING:\n\n");
+		printf("\tDo not use --repair unless you are advised to do so by a developer\n");
+		printf("\tor an experienced user, and then only after having accepted that no\n");
+		printf("\tfsck can successfully repair all types of filesystem corruption. Eg.\n");
+		printf("\tsome software or hardware bugs can fatally damage a volume.\n");
+		printf("\tThe operation will start in %d seconds.\n", delay);
+		printf("\tUse Ctrl-C to stop it.\n");
+		while (delay) {
+			printf("%2d", delay--);
+			fflush(stdout);
+			sleep(1);
+		}
+		printf("\nStarting repair.\n");
+	}
+
 	/*
 	 * experimental and dangerous
 	 */
@@ -9998,12 +10015,6 @@  static int cmd_check(const struct cmd_struct *cmd, int argc, char **argv)
 			goto err_out;
 		}
 	} else {
-		if (repair) {
-			error("repair and --force is not yet supported");
-			ret = 1;
-			err |= !!ret;
-			goto err_out;
-		}
 		if (ret < 0) {
 			warning(
 "cannot check mount status of %s, the filesystem could be mounted, continuing because of --force",
diff --git a/tests/cli-tests/007-check-force/test.sh b/tests/cli-tests/007-check-force/test.sh
index 317b8cf42f83..6025b8545c52 100755
--- a/tests/cli-tests/007-check-force/test.sh
+++ b/tests/cli-tests/007-check-force/test.sh
@@ -26,7 +26,5 @@  run_mustfail "checking mounted filesystem with --force --repair" \
 run_check_umount_test_dev
 run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
 run_check $SUDO_HELPER "$TOP/btrfs" check --force "$TEST_DEV"
-run_mustfail "--force --repair on unmounted filesystem" \
-	$SUDO_HELPER "$TOP/btrfs" check --force --repair "$TEST_DEV"
 
 cleanup_loopdevs
diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index ac5a406a8b8a..33beb8bf55b4 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -35,7 +35,7 @@  test_extent_tree_rebuild()
 
 	$SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV" >& /dev/null && \
 			_fail "btrfs check should detect failure"
-	run_check $SUDO_HELPER "$TOP/btrfs" check --repair --init-extent-tree "$TEST_DEV"
+	run_check $SUDO_HELPER "$TOP/btrfs" check --repair --force --init-extent-tree "$TEST_DEV"
 	run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
 }
 
diff --git a/tests/fsck-tests/032-corrupted-qgroup/test.sh b/tests/fsck-tests/032-corrupted-qgroup/test.sh
index 4bfa36013e81..91bbd51a4ebd 100755
--- a/tests/fsck-tests/032-corrupted-qgroup/test.sh
+++ b/tests/fsck-tests/032-corrupted-qgroup/test.sh
@@ -13,7 +13,7 @@  check_image() {
 		     "$TOP/btrfs" check "$1"
 	# Above command can fail due to other bugs, so add extra check to
 	# ensure we can fix qgroup without problems.
-	run_check "$TOP/btrfs" check --repair "$1"
+	run_check "$TOP/btrfs" check --repair --force "$1"
 }
 
 check_all_images
diff --git a/tests/fuzz-tests/003-multi-check-unmounted/test.sh b/tests/fuzz-tests/003-multi-check-unmounted/test.sh
index 3021c3a84968..176272e508d7 100755
--- a/tests/fuzz-tests/003-multi-check-unmounted/test.sh
+++ b/tests/fuzz-tests/003-multi-check-unmounted/test.sh
@@ -18,7 +18,7 @@  check_image() {
 	run_mayfail $TOP/btrfs check --init-extent-tree "$image"
 	run_mayfail $TOP/btrfs check --check-data-csum "$image"
 	run_mayfail $TOP/btrfs check --subvol-extents "$image"
-	run_mayfail $TOP/btrfs check --repair "$image"
+	run_mayfail $TOP/btrfs check --repair --force "$image"
 }
 
 check_all_images "$TEST_TOP/fuzz-tests/images"