Message ID | db8c6de3dfda46d9e3c0dbebc7f10a898f8be112.1694749532.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: recover from failed metadata_uuid port kernel | expand |
On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote: > The misc-test/034-metadata_uuid test case, has four sets of disk images to > simulate failed writes during btrfstune -m|M operations. As of now, this > tests kernel only. Update the test case to verify btrfstune -m|M's > capacity to recover from the same scenarios. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > tests/misc-tests/034-metadata-uuid/test.sh | 70 ++++++++++++++++------ > 1 file changed, 53 insertions(+), 17 deletions(-) > > diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh > index 479c7da7a5b2..0b06f1266f57 100755 > --- a/tests/misc-tests/034-metadata-uuid/test.sh > +++ b/tests/misc-tests/034-metadata-uuid/test.sh > @@ -195,13 +195,42 @@ check_multi_fsid_unchanged() { > check_flag_cleared "$1" "$2" > } > > -failure_recovery() { > +failure_recovery_progs() { > + local image1 > + local image2 > + local loop1 > + local loop2 > + local devcount > + > + image1=$(extract_image "$1") > + image2=$(extract_image "$2") > + loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1") > + loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2") > + > + run_check $SUDO_HELPER udevadm settle > + > + # Scan to make sure btrfs detects both devices before trying to mount > + #run_check "$TOP/btrfstune" -m --noscan --device="$loop1" "$loop2" > + run_check "$TOP/btrfstune" -m "$loop2" This lacks $SUDO_HELPER so it does fails when the whole testuite is not run by a root user. Please make sure that 'make test-...' actually works before sending the patches.
On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote: > The misc-test/034-metadata_uuid test case, has four sets of disk images to > simulate failed writes during btrfstune -m|M operations. As of now, this > tests kernel only. Update the test case to verify btrfstune -m|M's > capacity to recover from the same scenarios. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> With all the problems fixed, the test still fails. I'm not sure which case it is: ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored /dev/loop0 ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored /dev/loop1 ====== RUN CHECK root_helper udevadm settle ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1 parent transid verify failed on 30425088 wanted 6 found 4 parent transid verify failed on 30441472 wanted 6 found 4 Error writing to device 1 ERROR: failed to write tree block 30457856: Operation not permitted ERROR: btrfstune failed failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1 test failed for case 034-metadata-uuid Looks like a write that's beyond the device limit. I'll keep the patches and tests in devel so you can have a look.
On 3/10/23 01:19, David Sterba wrote: > On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote: >> The misc-test/034-metadata_uuid test case, has four sets of disk images to >> simulate failed writes during btrfstune -m|M operations. As of now, this >> tests kernel only. Update the test case to verify btrfstune -m|M's >> capacity to recover from the same scenarios. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> > > With all the problems fixed, the test still fails. I'm not sure which case it > is: > > ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored > /dev/loop0 > ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored > /dev/loop1 > ====== RUN CHECK root_helper udevadm settle > ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1 > parent transid verify failed on 30425088 wanted 6 found 4 > parent transid verify failed on 30441472 wanted 6 found 4 > Error writing to device 1 > ERROR: failed to write tree block 30457856: Operation not permitted > ERROR: btrfstune failed > failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m /dev/loop1 > test failed for case 034-metadata-uuid > > Looks like a write that's beyond the device limit. I'll keep the patches > and tests in devel so you can have a look. As a root user, your devel branch passes here. (Generally, I have been using the following command as root:) $ make TEST=034* test-misc [LD] fssum [LD] fsstress [TEST] misc-tests.sh [TEST/misc] 034-metadata-uuid Scanning /btrfs-progs/tests/misc-tests-results.txt Let me try as a non-root user. Also, could you please make sure that all the 'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before starting the test case? Thanks, Anand Thanks, Anand
On 3/10/23 16:00, Anand Jain wrote: > > > On 3/10/23 01:19, David Sterba wrote: >> On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote: >>> The misc-test/034-metadata_uuid test case, has four sets of disk >>> images to >>> simulate failed writes during btrfstune -m|M operations. As of now, this >>> tests kernel only. Update the test case to verify btrfstune -m|M's >>> capacity to recover from the same scenarios. >>> >>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> >> With all the problems fixed, the test still fails. I'm not sure which >> case it >> is: >> >> ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored >> /dev/loop0 >> ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored >> /dev/loop1 >> ====== RUN CHECK root_helper udevadm settle >> ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune >> -m /dev/loop1 >> parent transid verify failed on 30425088 wanted 6 found 4 >> parent transid verify failed on 30441472 wanted 6 found 4 >> Error writing to device 1 >> ERROR: failed to write tree block 30457856: Operation not permitted >> ERROR: btrfstune failed >> failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m >> /dev/loop1 >> test failed for case 034-metadata-uuid >> >> Looks like a write that's beyond the device limit. I'll keep the patches >> and tests in devel so you can have a look. > > > As a root user, your devel branch passes here. > > (Generally, I have been using the following command as root:) > > $ make TEST=034* test-misc > [LD] fssum > [LD] fsstress > [TEST] misc-tests.sh > [TEST/misc] 034-metadata-uuid > Scanning /btrfs-progs/tests/misc-tests-results.txt > > Let me try as a non-root user. > > Also, could you please make sure that all the > 'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before > starting the test case? This pass as non-root. $ sudo make TEST=034* test-misc [LD] fssum [LD] fsstress [TEST] misc-tests.sh [TEST/misc] 034-metadata-uuid Scanning /btrfs-progs/tests/misc-tests-results.txt So I think there might be some stale *restored images; Could you pls check. Thanks, Anand
On Tue, Oct 03, 2023 at 04:38:49PM +0800, Anand Jain wrote: > On 3/10/23 16:00, Anand Jain wrote: > > > > > > On 3/10/23 01:19, David Sterba wrote: > >> On Fri, Sep 15, 2023 at 12:08:59PM +0800, Anand Jain wrote: > >>> The misc-test/034-metadata_uuid test case, has four sets of disk > >>> images to > >>> simulate failed writes during btrfstune -m|M operations. As of now, this > >>> tests kernel only. Update the test case to verify btrfstune -m|M's > >>> capacity to recover from the same scenarios. > >>> > >>> Signed-off-by: Anand Jain <anand.jain@oracle.com> > >> > >> With all the problems fixed, the test still fails. I'm not sure which > >> case it > >> is: > >> > >> ====== RUN CHECK root_helper losetup --find --show ./disk1.raw.restored > >> /dev/loop0 > >> ====== RUN CHECK root_helper losetup --find --show ./disk2.raw.restored > >> /dev/loop1 > >> ====== RUN CHECK root_helper udevadm settle > >> ====== RUN CHECK root_helper /labs/dsterba/gits/btrfs-progs/btrfstune > >> -m /dev/loop1 > >> parent transid verify failed on 30425088 wanted 6 found 4 > >> parent transid verify failed on 30441472 wanted 6 found 4 > >> Error writing to device 1 > >> ERROR: failed to write tree block 30457856: Operation not permitted > >> ERROR: btrfstune failed > >> failed: root_helper /labs/dsterba/gits/btrfs-progs/btrfstune -m > >> /dev/loop1 > >> test failed for case 034-metadata-uuid > >> > >> Looks like a write that's beyond the device limit. I'll keep the patches > >> and tests in devel so you can have a look. > > > > > > As a root user, your devel branch passes here. > > > > (Generally, I have been using the following command as root:) > > > > $ make TEST=034* test-misc > > [LD] fssum > > [LD] fsstress > > [TEST] misc-tests.sh > > [TEST/misc] 034-metadata-uuid > > Scanning /btrfs-progs/tests/misc-tests-results.txt > > > > Let me try as a non-root user. > > > > Also, could you please make sure that all the > > 'tests/misc-tests/034-metadata-uuid/*.restored' files are removed before > > starting the test case? > > This pass as non-root. > > $ sudo make TEST=034* test-misc > [LD] fssum > [LD] fsstress > [TEST] misc-tests.sh > [TEST/misc] 034-metadata-uuid > Scanning /btrfs-progs/tests/misc-tests-results.txt > > So I think there might be some stale *restored images; Could you pls check. It was indeed something on my side, the test now passes and also in CI.
diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh index 479c7da7a5b2..0b06f1266f57 100755 --- a/tests/misc-tests/034-metadata-uuid/test.sh +++ b/tests/misc-tests/034-metadata-uuid/test.sh @@ -195,13 +195,42 @@ check_multi_fsid_unchanged() { check_flag_cleared "$1" "$2" } -failure_recovery() { +failure_recovery_progs() { + local image1 + local image2 + local loop1 + local loop2 + local devcount + + image1=$(extract_image "$1") + image2=$(extract_image "$2") + loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1") + loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2") + + run_check $SUDO_HELPER udevadm settle + + # Scan to make sure btrfs detects both devices before trying to mount + #run_check "$TOP/btrfstune" -m --noscan --device="$loop1" "$loop2" + run_check "$TOP/btrfstune" -m "$loop2" + + # perform any specific check + "$3" "$loop1" "$loop2" + + # cleanup + run_check $SUDO_HELPER losetup -d "$loop1" + run_check $SUDO_HELPER losetup -d "$loop2" + rm -f -- "$image1" "$image2" +} + +failure_recovery_kernel() { local image1 local image2 local loop1 local loop2 local devcount + reload_btrfs + image1=$(extract_image "$1") image2=$(extract_image "$2") loop1=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image1") @@ -226,47 +255,55 @@ failure_recovery() { rm -f -- "$image1" "$image2" } +failure_recovery() { + failure_recovery_progs $@ + failure_recovery_kernel $@ +} + reload_btrfs() { run_check $SUDO_HELPER rmmod btrfs run_check $SUDO_HELPER modprobe btrfs } -# for full coverage we need btrfs to actually be a module -modinfo btrfs > /dev/null 2>&1 || _not_run "btrfs must be a module" -run_mayfail $SUDO_HELPER modprobe -r btrfs || _not_run "btrfs must be unloadable" -run_mayfail $SUDO_HELPER modprobe btrfs || _not_run "loading btrfs module failed" +test_progs() { + run_check_mkfs_test_dev + check_btrfstune + + run_check_mkfs_test_dev + check_dump_super_output -run_check_mkfs_test_dev -check_btrfstune + run_check_mkfs_test_dev + check_image_restore +} + +check_kernel_reloadable() { + # for full coverage we need btrfs to actually be a module + modinfo btrfs > /dev/null 2>&1 || _not_run "btrfs must be a module" + run_mayfail $SUDO_HELPER modprobe -r btrfs || _not_run "btrfs must be unloadable" + run_mayfail $SUDO_HELPER modprobe btrfs || _not_run "loading btrfs module failed" +} -run_check_mkfs_test_dev -check_dump_super_output +check_kernel_reloadable -run_check_mkfs_test_dev -check_image_restore +test_progs # disk1 is an image which has no metadata uuid flags set and disk2 is part of # the same fs but has the in-progress flag set. Test that whicever is scanned # first will result in consistent filesystem. failure_recovery "./disk1.raw.xz" "./disk2.raw.xz" check_inprogress_flag -reload_btrfs failure_recovery "./disk2.raw.xz" "./disk1.raw.xz" check_inprogress_flag # disk4 contains an image in with the in-progress flag set and disk 3 is part # of the same filesystem but has both METADATA_UUID incompat and a new # metadata uuid set. So disk 3 must always take precedence -reload_btrfs failure_recovery "./disk3.raw.xz" "./disk4.raw.xz" check_completed -reload_btrfs failure_recovery "./disk4.raw.xz" "./disk3.raw.xz" check_completed # disk5 contains an image which has undergone a successful fsid change more # than once, disk6 on the other hand is member of the same filesystem but # hasn't completed its last change. Thus it has both the FSID_CHANGING flag set # and METADATA_UUID flag set. -reload_btrfs failure_recovery "./disk5.raw.xz" "./disk6.raw.xz" check_multi_fsid_change -reload_btrfs failure_recovery "./disk6.raw.xz" "./disk5.raw.xz" check_multi_fsid_change # disk7 contains an image which has undergone a successful fsid change once to @@ -275,5 +312,4 @@ failure_recovery "./disk6.raw.xz" "./disk5.raw.xz" check_multi_fsid_change # during the process change. So disk 7 looks as if it never underwent fsid change # and disk 8 has FSID_CHANGING_FLAG and METADATA_UUID but is stale. failure_recovery "./disk7.raw.xz" "./disk8.raw.xz" check_multi_fsid_unchanged -reload_btrfs failure_recovery "./disk8.raw.xz" "./disk7.raw.xz" check_multi_fsid_unchanged
The misc-test/034-metadata_uuid test case, has four sets of disk images to simulate failed writes during btrfstune -m|M operations. As of now, this tests kernel only. Update the test case to verify btrfstune -m|M's capacity to recover from the same scenarios. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- tests/misc-tests/034-metadata-uuid/test.sh | 70 ++++++++++++++++------ 1 file changed, 53 insertions(+), 17 deletions(-)