Message ID | 20200421113643.24224-1-ailiop@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | common/fuzzy: don't attempt online scrubbing unless supported | expand |
On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote: > Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors > even when _scratch_xfs_fuzz_metadata is invoked with "offline". This > causes those tests to fail with output mismatches on kernels that don't > enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported. > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> > --- > common/fuzzy | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/common/fuzzy b/common/fuzzy > index 988203b1..83ddc3e8 100644 > --- a/common/fuzzy > +++ b/common/fuzzy > @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() { > if [ $res -eq 0 ]; then > # Try an online scrub unless we're fuzzing ag 0's sb, > # which scrub doesn't know how to fix. > - if [ "${repair}" != "none" ]; then > + if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then Huh? This changes the behavior of the repair=none fuzz tests, which mutate filesystems and then write to them without running any checking tools whatsoever to see if we can trip over the mutations with only regular filesystem operations. Whereas before, we'd skip xfs_scrub, now we always run it if it's supported. The open-coded repair conditionals scattered through this funciton probably ought to be refactored into helpers, e.g. __fuzz_want_scrub_check() { local repair="$1" local field="$2" test "${repair}" != "none" && \ _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \ [ "${field}" != "sb 0" ] } if [ $res -eq 0 ]; then # Try an online scrub... if __fuzz_want_scrub_check "${repair}" "${field}"; then _scratch_scrub -n -a 1 -e continue 2>&1 ... --D > echo "++ Online scrub" > if [ "$1" != "sb 0" ]; then > _scratch_scrub -n -e continue 2>&1 > -- > 2.26.2 >
On Tue, Apr 21, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote: > On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote: > > Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors > > even when _scratch_xfs_fuzz_metadata is invoked with "offline". This > > causes those tests to fail with output mismatches on kernels that don't > > enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported. > > > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> > > --- > > common/fuzzy | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/common/fuzzy b/common/fuzzy > > index 988203b1..83ddc3e8 100644 > > --- a/common/fuzzy > > +++ b/common/fuzzy > > @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() { > > if [ $res -eq 0 ]; then > > # Try an online scrub unless we're fuzzing ag 0's sb, > > # which scrub doesn't know how to fix. > > - if [ "${repair}" != "none" ]; then > > + if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then > > Huh? > > This changes the behavior of the repair=none fuzz tests, which mutate > filesystems and then write to them without running any checking tools > whatsoever to see if we can trip over the mutations with only regular > filesystem operations. Whereas before, we'd skip xfs_scrub, now we > always run it if it's supported. oops...right, we want to let the verifiers catch the errors here. Speaking of which, I've been staring at the scripts but it's not clear to me how the repair=none fuzz tests are expected to function. Many of those tests corrupt AG metadata headers (e.g. the AGI lsn), which means mount bails out with an error. But the golden output doesn't account for that, so those tests will fail (e.g. xfs/456). Further, for things like inode metadata fuzzing where the fs is usually mountable, the tests will always succeed irrespective of the verifiers firing or not (e.g. xfs/465). I'd assume all those repair=none tests would need to check dmesg for metadata corruptions, so something like: --- a/common/fuzzy +++ b/common/fuzzy @@ -254,3 +254,3 @@ __scratch_xfs_fuzz_field_test() { __scratch_xfs_fuzz_unmount - else + elif [ "${repair}" != "none" ]; then (>&2 echo "re-mount failed ($res) with ${field} = ${fuzzverb}.") --- a/tests/xfs/456 +++ b/tests/xfs/456 @@ -43,2 +43,5 @@ echo "Done fuzzing AGI" +_check_dmesg_for "Metadata corruption detected" || \ + _fail "Missing metadata corruption messages!" + # success, all done If this makes sense, I'll send a separate patch to address it, and fix all repair=none tests as above. > The open-coded repair conditionals scattered through this funciton > probably ought to be refactored into helpers, e.g. > > __fuzz_want_scrub_check() { > local repair="$1" > local field="$2" > > test "${repair}" != "none" && \ > _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \ > [ "${field}" != "sb 0" ] > } > > if [ $res -eq 0 ]; then > # Try an online scrub... > if __fuzz_want_scrub_check "${repair}" "${field}"; then > _scratch_scrub -n -a 1 -e continue 2>&1 > ... Will do that and send a v2, thanks for the review!
On Fri, Apr 24, 2020 at 01:03:30PM +0200, Anthony Iliopoulos wrote: > On Tue, Apr 21, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote: > > On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote: > > > Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors > > > even when _scratch_xfs_fuzz_metadata is invoked with "offline". This > > > causes those tests to fail with output mismatches on kernels that don't > > > enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported. > > > > > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> > > > --- > > > common/fuzzy | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/common/fuzzy b/common/fuzzy > > > index 988203b1..83ddc3e8 100644 > > > --- a/common/fuzzy > > > +++ b/common/fuzzy > > > @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() { > > > if [ $res -eq 0 ]; then > > > # Try an online scrub unless we're fuzzing ag 0's sb, > > > # which scrub doesn't know how to fix. > > > - if [ "${repair}" != "none" ]; then > > > + if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then > > > > Huh? > > > > This changes the behavior of the repair=none fuzz tests, which mutate > > filesystems and then write to them without running any checking tools > > whatsoever to see if we can trip over the mutations with only regular > > filesystem operations. Whereas before, we'd skip xfs_scrub, now we > > always run it if it's supported. > > oops...right, we want to let the verifiers catch the errors here. > > Speaking of which, I've been staring at the scripts but it's not clear > to me how the repair=none fuzz tests are expected to function. Many of > those tests corrupt AG metadata headers (e.g. the AGI lsn), which means > mount bails out with an error. But the golden output doesn't account for > that, so those tests will fail (e.g. xfs/456). None of the dangerous_fuzz* tests have golden output. I've thought about posting the ones that are in my dev tree, but there's probably not much point until I/we finish fixing the things that repair misses. Ditto with scrub. TBH I wrote all these field fuzzers solely as a means to fuzz things systematically, and didn't think too hard about using them as regression tests. Back when I introduced the dangerous_norepair tests, it was success enough if the kernel didn't crash. I /think/ we've tightened things up enough that it's time to move on to more careful checking for runtime errors. > Further, for things like inode metadata fuzzing where the fs is usually > mountable, the tests will always succeed irrespective of the verifiers > firing or not (e.g. xfs/465). > > I'd assume all those repair=none tests would need to check dmesg for > metadata corruptions, so something like: > > --- a/common/fuzzy > +++ b/common/fuzzy > @@ -254,3 +254,3 @@ __scratch_xfs_fuzz_field_test() { > __scratch_xfs_fuzz_unmount > - else > + elif [ "${repair}" != "none" ]; then > (>&2 echo "re-mount failed ($res) with ${field} = ${fuzzverb}.") Hmm, seems reasonable. The _scratch_fuzz_modify helper probably needs to be modified to complain if the fs writes actually succeed. > --- a/tests/xfs/456 > +++ b/tests/xfs/456 > @@ -43,2 +43,5 @@ echo "Done fuzzing AGI" > > +_check_dmesg_for "Metadata corruption detected" || \ > + _fail "Missing metadata corruption messages!" I'd put this at the end of __scratch_xfs_fuzz_field_test, since there are dozens of tests that use this function, not just xfs/456. The long term goal is to make all the corruption bailouts report themselves to the online health system so that userspace can query the filesystem status (via xfs_spaceman) without having to scrape dmesg. --D > + > # success, all done > > If this makes sense, I'll send a separate patch to address it, and fix > all repair=none tests as above. > > > The open-coded repair conditionals scattered through this funciton > > probably ought to be refactored into helpers, e.g. > > > > __fuzz_want_scrub_check() { > > local repair="$1" > > local field="$2" > > > > test "${repair}" != "none" && \ > > _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \ > > [ "${field}" != "sb 0" ] > > } > > > > if [ $res -eq 0 ]; then > > # Try an online scrub... > > if __fuzz_want_scrub_check "${repair}" "${field}"; then > > _scratch_scrub -n -a 1 -e continue 2>&1 > > ... > > Will do that and send a v2, thanks for the review!
diff --git a/common/fuzzy b/common/fuzzy index 988203b1..83ddc3e8 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() { if [ $res -eq 0 ]; then # Try an online scrub unless we're fuzzing ag 0's sb, # which scrub doesn't know how to fix. - if [ "${repair}" != "none" ]; then + if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then echo "++ Online scrub" if [ "$1" != "sb 0" ]; then _scratch_scrub -n -e continue 2>&1
Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors even when _scratch_xfs_fuzz_metadata is invoked with "offline". This causes those tests to fail with output mismatches on kernels that don't enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> --- common/fuzzy | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)