diff mbox series

[4/4] btrfs-progs: test btrfstune -m|M ability to fix previous failures

Message ID db8c6de3dfda46d9e3c0dbebc7f10a898f8be112.1694749532.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: recover from failed metadata_uuid port kernel | expand

Commit Message

Anand Jain Sept. 15, 2023, 4:08 a.m. UTC
The misc-test/034-metadata_uuid test case, has four sets of disk images to
simulate failed writes during btrfstune -m|M operations. As of now, this
tests kernel only. Update the test case to verify btrfstune -m|M's
capacity to recover from the same scenarios.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 tests/misc-tests/034-metadata-uuid/test.sh | 70 ++++++++++++++++------
 1 file changed, 53 insertions(+), 17 deletions(-)

Comments

David Sterba Oct. 2, 2023, 5:16 p.m. UTC | #1
On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote:
> The misc-test/034-metadata_uuid test case, has four sets of disk images to
> simulate failed writes during btrfstune -m|M operations. As of now, this
> tests kernel only. Update the test case to verify btrfstune -m|M's
> capacity to recover from the same scenarios.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/misc-tests/034-metadata-uuid/test.sh | 70 ++++++++++++++++------
>  1 file changed, 53 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh
> index 479c7da7a5b2..0b06f1266f57 100755
> --- a/tests/misc-tests/034-metadata-uuid/test.sh
> +++ b/tests/misc-tests/034-metadata-uuid/test.sh
> @@ -195,13 +195,42 @@ check_multi_fsid_unchanged() {
>  	check_flag_cleared "$1" "$2"
>  }
>  
> -failure_recovery() {
> +failure_recovery_progs() {
> +	local image1
> +	local image2
> +	local loop1
> +	local loop2
> +	local devcount
> +
> +	image1=$(extract_image "$1")
> +	image2=$(extract_image "$2")
> +	loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1")
> +	loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2")
> +
> +	run_check $SUDO_HELPER udevadm settle
> +
> +	# Scan to make sure btrfs detects both devices before trying to mount
> +	#run_check "$TOP/btrfstune" -m --noscan --device="$loop1" "$loop2"
> +	run_check "$TOP/btrfstune" -m "$loop2"

This lacks $SUDO_HELPER so it does fails when the whole testuite is not
run by a root user. Please make sure that 'make test-...' actually works
before sending the patches.
David Sterba Oct. 2, 2023, 5:19 p.m. UTC | #2
On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote:
> The misc-test/034-metadata_uuid test case, has four sets of disk images to
> simulate failed writes during btrfstune -m|M operations. As of now, this
> tests kernel only. Update the test case to verify btrfstune -m|M's
> capacity to recover from the same scenarios.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

With all the problems fixed, the test still fails.  I'm not sure which case it
is:

====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored
/dev/loop0
====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored
/dev/loop1
====== RUN CHECK root_helper udevadm settle
====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1
parent transid verify failed on 30425088 wanted 6 found 4
parent transid verify failed on 30441472 wanted 6 found 4
Error writing to device 1
ERROR: failed to write tree block 30457856: Operation not permitted
ERROR: btrfstune failed
failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1
test failed for case 034-metadata-uuid

Looks like a write that's beyond the device limit. I'll keep the patches
and tests in devel so you can have a look.
Anand Jain Oct. 3, 2023, 8 a.m. UTC | #3
On 3/10/23 01:19, David Sterba wrote:
> On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote:
>> The misc-test/034-metadata_uuid test case, has four sets of disk images to
>> simulate failed writes during btrfstune -m|M operations. As of now, this
>> tests kernel only. Update the test case to verify btrfstune -m|M's
>> capacity to recover from the same scenarios.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> With all the problems fixed, the test still fails.  I'm not sure which case it
> is:
> 
> ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored
> /dev/loop0
> ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored
> /dev/loop1
> ====== RUN CHECK root_helper udevadm settle
> ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1
> parent transid verify failed on 30425088 wanted 6 found 4
> parent transid verify failed on 30441472 wanted 6 found 4
> Error writing to device 1
> ERROR: failed to write tree block 30457856: Operation not permitted
> ERROR: btrfstune failed
> failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1
> test failed for case 034-metadata-uuid
> 
> Looks like a write that's beyond the device limit. I'll keep the patches
> and tests in devel so you can have a look.


As a root user, your devel branch passes here.

(Generally, I have been using the following command as root:)

  $ make TEST=034* test-misc
  [LD] fssum
  [LD] fsstress
  [TEST] misc-tests.sh
  [TEST/misc] 034-metadata-uuid
  Scanning /btrfs-progs/tests/misc-tests-results.txt

Let me try as a non-root user.

Also, could you please make sure that all the 
'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before 
starting the test case?

Thanks, Anand





Thanks, Anand
Anand Jain Oct. 3, 2023, 8:38 a.m. UTC | #4
On 3/10/23 16:00, Anand Jain wrote:
> 
> 
> On 3/10/23 01:19, David Sterba wrote:
>> On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote:
>>> The misc-test/034-metadata_uuid test case, has four sets of disk 
>>> images to
>>> simulate failed writes during btrfstune -m|M operations. As of now, this
>>> tests kernel only. Update the test case to verify btrfstune -m|M's
>>> capacity to recover from the same scenarios.
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>
>> With all the problems fixed, the test still fails.  I'm not sure which 
>> case it
>> is:
>>
>> ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored
>> /dev/loop0
>> ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored
>> /dev/loop1
>> ====== RUN CHECK root_helper udevadm settle
>> ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune 
>> -m /dev/loop1
>> parent transid verify failed on 30425088 wanted 6 found 4
>> parent transid verify failed on 30441472 wanted 6 found 4
>> Error writing to device 1
>> ERROR: failed to write tree block 30457856: Operation not permitted
>> ERROR: btrfstune failed
>> failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m 
>> /dev/loop1
>> test failed for case 034-metadata-uuid
>>
>> Looks like a write that's beyond the device limit. I'll keep the patches
>> and tests in devel so you can have a look.
> 
> 
> As a root user, your devel branch passes here.
> 
> (Generally, I have been using the following command as root:)
> 
>   $ make TEST=034* test-misc
>   [LD] fssum
>   [LD] fsstress
>   [TEST] misc-tests.sh
>   [TEST/misc] 034-metadata-uuid
>   Scanning /btrfs-progs/tests/misc-tests-results.txt
> 
> Let me try as a non-root user.
> 
> Also, could you please make sure that all the 
> 'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before 
> starting the test case?

This pass as non-root.

$ sudo make TEST=034* test-misc
     [LD]     fssum
     [LD]     fsstress
     [TEST]   misc-tests.sh
     [TEST/misc]   034-metadata-uuid
Scanning /btrfs-progs/tests/misc-tests-results.txt

So I think there might be some stale *restored images; Could you pls check.

Thanks, Anand
David Sterba Oct. 3, 2023, 5:36 p.m. UTC | #5
On Tue, Oct 03, 2023 at 04:38:49PM +0800, Anand Jain wrote:
> On 3/10/23 16:00, Anand Jain wrote:
> > 
> > 
> > On 3/10/23 01:19, David Sterba wrote:
> >> On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote:
> >>> The misc-test/034-metadata_uuid test case, has four sets of disk 
> >>> images to
> >>> simulate failed writes during btrfstune -m|M operations. As of now, this
> >>> tests kernel only. Update the test case to verify btrfstune -m|M's
> >>> capacity to recover from the same scenarios.
> >>>
> >>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> >>
> >> With all the problems fixed, the test still fails.  I'm not sure which 
> >> case it
> >> is:
> >>
> >> ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored
> >> /dev/loop0
> >> ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored
> >> /dev/loop1
> >> ====== RUN CHECK root_helper udevadm settle
> >> ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune 
> >> -m /dev/loop1
> >> parent transid verify failed on 30425088 wanted 6 found 4
> >> parent transid verify failed on 30441472 wanted 6 found 4
> >> Error writing to device 1
> >> ERROR: failed to write tree block 30457856: Operation not permitted
> >> ERROR: btrfstune failed
> >> failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m 
> >> /dev/loop1
> >> test failed for case 034-metadata-uuid
> >>
> >> Looks like a write that's beyond the device limit. I'll keep the patches
> >> and tests in devel so you can have a look.
> > 
> > 
> > As a root user, your devel branch passes here.
> > 
> > (Generally, I have been using the following command as root:)
> > 
> >   $ make TEST=034* test-misc
> >   [LD] fssum
> >   [LD] fsstress
> >   [TEST] misc-tests.sh
> >   [TEST/misc] 034-metadata-uuid
> >   Scanning /btrfs-progs/tests/misc-tests-results.txt
> > 
> > Let me try as a non-root user.
> > 
> > Also, could you please make sure that all the 
> > 'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before 
> > starting the test case?
> 
> This pass as non-root.
> 
> $ sudo make TEST=034* test-misc
>      [LD]     fssum
>      [LD]     fsstress
>      [TEST]   misc-tests.sh
>      [TEST/misc]   034-metadata-uuid
> Scanning /btrfs-progs/tests/misc-tests-results.txt
> 
> So I think there might be some stale *restored images; Could you pls check.

It was indeed something on my side, the test now passes and also in CI.
diff mbox series

Patch

diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh
index 479c7da7a5b2..0b06f1266f57 100755
--- a/tests/misc-tests/034-metadata-uuid/test.sh
+++ b/tests/misc-tests/034-metadata-uuid/test.sh
@@ -195,13 +195,42 @@  check_multi_fsid_unchanged() {
 	check_flag_cleared "$1" "$2"
 }
 
-failure_recovery() {
+failure_recovery_progs() {
+	local image1
+	local image2
+	local loop1
+	local loop2
+	local devcount
+
+	image1=$(extract_image "$1")
+	image2=$(extract_image "$2")
+	loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1")
+	loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2")
+
+	run_check $SUDO_HELPER udevadm settle
+
+	# Scan to make sure btrfs detects both devices before trying to mount
+	#run_check "$TOP/btrfstune" -m --noscan --device="$loop1" "$loop2"
+	run_check "$TOP/btrfstune" -m "$loop2"
+
+	# perform any specific check
+	"$3" "$loop1" "$loop2"
+
+	# cleanup
+	run_check $SUDO_HELPER losetup -d "$loop1"
+	run_check $SUDO_HELPER losetup -d "$loop2"
+	rm -f -- "$image1" "$image2"
+}
+
+failure_recovery_kernel() {
 	local image1
 	local image2
 	local loop1
 	local loop2
 	local devcount
 
+	reload_btrfs
+
 	image1=$(extract_image "$1")
 	image2=$(extract_image "$2")
 	loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1")
@@ -226,47 +255,55 @@  failure_recovery() {
 	rm -f -- "$image1" "$image2"
 }
 
+failure_recovery() {
+	failure_recovery_progs $@
+	failure_recovery_kernel $@
+}
+
 reload_btrfs() {
 	run_check $SUDO_HELPER rmmod btrfs
 	run_check $SUDO_HELPER modprobe btrfs
 }
 
-# for full coverage we need btrfs to actually be a module
-modinfo btrfs > /dev/null 2>&1 || _not_run "btrfs must be a module"
-run_mayfail $SUDO_HELPER modprobe -r btrfs || _not_run "btrfs must be unloadable"
-run_mayfail $SUDO_HELPER modprobe btrfs || _not_run "loading btrfs module failed"
+test_progs() {
+	run_check_mkfs_test_dev
+	check_btrfstune
+
+	run_check_mkfs_test_dev
+	check_dump_super_output
 
-run_check_mkfs_test_dev
-check_btrfstune
+	run_check_mkfs_test_dev
+	check_image_restore
+}
+
+check_kernel_reloadable() {
+	# for full coverage we need btrfs to actually be a module
+	modinfo btrfs > /dev/null 2>&1 || _not_run "btrfs must be a module"
+	run_mayfail $SUDO_HELPER modprobe -r btrfs || _not_run "btrfs must be unloadable"
+	run_mayfail $SUDO_HELPER modprobe btrfs || _not_run "loading btrfs module failed"
+}
 
-run_check_mkfs_test_dev
-check_dump_super_output
+check_kernel_reloadable
 
-run_check_mkfs_test_dev
-check_image_restore
+test_progs
 
 # disk1 is an image which has no metadata uuid flags set and disk2 is part of
 # the same fs but has the in-progress flag set. Test that whicever is scanned
 # first will result in consistent filesystem.
 failure_recovery "./disk1.raw.xz" "./disk2.raw.xz" check_inprogress_flag
-reload_btrfs
 failure_recovery "./disk2.raw.xz" "./disk1.raw.xz" check_inprogress_flag
 
 # disk4 contains an image in with the in-progress flag set and disk 3 is part
 # of the same filesystem but has both METADATA_UUID incompat and a new
 # metadata uuid set. So disk 3 must always take precedence
-reload_btrfs
 failure_recovery "./disk3.raw.xz" "./disk4.raw.xz" check_completed
-reload_btrfs
 failure_recovery "./disk4.raw.xz" "./disk3.raw.xz" check_completed
 
 # disk5 contains an image which has undergone a successful fsid change more
 # than once, disk6 on the other hand is member of the same filesystem but
 # hasn't completed its last change. Thus it has both the FSID_CHANGING flag set
 # and METADATA_UUID flag set.
-reload_btrfs
 failure_recovery "./disk5.raw.xz" "./disk6.raw.xz" check_multi_fsid_change
-reload_btrfs
 failure_recovery "./disk6.raw.xz" "./disk5.raw.xz" check_multi_fsid_change
 
 # disk7 contains an image which has undergone a successful fsid change once to
@@ -275,5 +312,4 @@  failure_recovery "./disk6.raw.xz" "./disk5.raw.xz" check_multi_fsid_change
 # during the process change. So disk 7 looks as if it never underwent fsid change
 # and disk 8 has FSID_CHANGING_FLAG and METADATA_UUID but is stale.
 failure_recovery "./disk7.raw.xz" "./disk8.raw.xz" check_multi_fsid_unchanged
-reload_btrfs
 failure_recovery "./disk8.raw.xz" "./disk7.raw.xz" check_multi_fsid_unchanged