diff mbox series

common/fuzzy: don't attempt online scrubbing unless supported

Message ID 20200421113643.24224-1-ailiop@suse.com
State New, archived
Headers show
Series common/fuzzy: don't attempt online scrubbing unless supported | expand

Commit Message

Anthony Iliopoulos April 21, 2020, 11:36 a.m. UTC
Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors
even when _scratch_xfs_fuzz_metadata is invoked with "offline". This
causes those tests to fail with output mismatches on kernels that don't
enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
---
 common/fuzzy | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Darrick J. Wong April 21, 2020, 3:37 p.m. UTC | #1
On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote:
> Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors
> even when _scratch_xfs_fuzz_metadata is invoked with "offline". This
> causes those tests to fail with output mismatches on kernels that don't
> enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported.
> 
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
>  common/fuzzy | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/common/fuzzy b/common/fuzzy
> index 988203b1..83ddc3e8 100644
> --- a/common/fuzzy
> +++ b/common/fuzzy
> @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() {
>  	if [ $res -eq 0 ]; then
>  		# Try an online scrub unless we're fuzzing ag 0's sb,
>  		# which scrub doesn't know how to fix.
> -		if [ "${repair}" != "none" ]; then
> +		if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then

Huh?

This changes the behavior of the repair=none fuzz tests, which mutate
filesystems and then write to them without running any checking tools
whatsoever to see if we can trip over the mutations with only regular
filesystem operations.  Whereas before, we'd skip xfs_scrub, now we
always run it if it's supported.

The open-coded repair conditionals scattered through this funciton
probably ought to be refactored into helpers, e.g.

__fuzz_want_scrub_check() {
	local repair="$1"
	local field="$2"

	test "${repair}" != "none" && \
		_supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \
		[ "${field}" != "sb 0" ]
}

if [ $res -eq 0 ]; then
	# Try an online scrub...
	if __fuzz_want_scrub_check "${repair}" "${field}"; then
		_scratch_scrub -n -a 1 -e continue 2>&1
		...

--D

>  			echo "++ Online scrub"
>  			if [ "$1" != "sb 0" ]; then
>  				_scratch_scrub -n -e continue 2>&1
> -- 
> 2.26.2
>
Anthony Iliopoulos April 24, 2020, 11:03 a.m. UTC | #2
On Tue, Apr 21, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote:
> > Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors
> > even when _scratch_xfs_fuzz_metadata is invoked with "offline". This
> > causes those tests to fail with output mismatches on kernels that don't
> > enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported.
> > 
> > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> > ---
> >  common/fuzzy | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/common/fuzzy b/common/fuzzy
> > index 988203b1..83ddc3e8 100644
> > --- a/common/fuzzy
> > +++ b/common/fuzzy
> > @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() {
> >  	if [ $res -eq 0 ]; then
> >  		# Try an online scrub unless we're fuzzing ag 0's sb,
> >  		# which scrub doesn't know how to fix.
> > -		if [ "${repair}" != "none" ]; then
> > +		if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then
> 
> Huh?
> 
> This changes the behavior of the repair=none fuzz tests, which mutate
> filesystems and then write to them without running any checking tools
> whatsoever to see if we can trip over the mutations with only regular
> filesystem operations.  Whereas before, we'd skip xfs_scrub, now we
> always run it if it's supported.

oops...right, we want to let the verifiers catch the errors here.

Speaking of which, I've been staring at the scripts but it's not clear
to me how the repair=none fuzz tests are expected to function. Many of
those tests corrupt AG metadata headers (e.g. the AGI lsn), which means
mount bails out with an error. But the golden output doesn't account for
that, so those tests will fail (e.g. xfs/456).

Further, for things like inode metadata fuzzing where the fs is usually
mountable, the tests will always succeed irrespective of the verifiers
firing or not (e.g. xfs/465).

I'd assume all those repair=none tests would need to check dmesg for
metadata corruptions, so something like:

--- a/common/fuzzy
+++ b/common/fuzzy
@@ -254,3 +254,3 @@ __scratch_xfs_fuzz_field_test() {
 		__scratch_xfs_fuzz_unmount
-	else
+	elif [ "${repair}" != "none" ]; then
 		(>&2 echo "re-mount failed ($res) with ${field} = ${fuzzverb}.")

--- a/tests/xfs/456
+++ b/tests/xfs/456
@@ -43,2 +43,5 @@ echo "Done fuzzing AGI"

+_check_dmesg_for "Metadata corruption detected" || \
+	_fail "Missing metadata corruption messages!"
+
 # success, all done

If this makes sense, I'll send a separate patch to address it, and fix
all repair=none tests as above.

> The open-coded repair conditionals scattered through this funciton
> probably ought to be refactored into helpers, e.g.
> 
> __fuzz_want_scrub_check() {
> 	local repair="$1"
> 	local field="$2"
> 
> 	test "${repair}" != "none" && \
> 		_supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \
> 		[ "${field}" != "sb 0" ]
> }
> 
> if [ $res -eq 0 ]; then
> 	# Try an online scrub...
> 	if __fuzz_want_scrub_check "${repair}" "${field}"; then
> 		_scratch_scrub -n -a 1 -e continue 2>&1
> 		...

Will do that and send a v2, thanks for the review!
Darrick J. Wong April 27, 2020, 5:03 p.m. UTC | #3
On Fri, Apr 24, 2020 at 01:03:30PM +0200, Anthony Iliopoulos wrote:
> On Tue, Apr 21, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 21, 2020 at 01:36:42PM +0200, Anthony Iliopoulos wrote:
> > > Many xfs metadata fuzzing tests invoke xfs_scrub to detect online errors
> > > even when _scratch_xfs_fuzz_metadata is invoked with "offline". This
> > > causes those tests to fail with output mismatches on kernels that don't
> > > enable CONFIG_XFS_ONLINE_SCRUB. Bypass scrubbing when not supported.
> > > 
> > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> > > ---
> > >  common/fuzzy | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/common/fuzzy b/common/fuzzy
> > > index 988203b1..83ddc3e8 100644
> > > --- a/common/fuzzy
> > > +++ b/common/fuzzy
> > > @@ -238,7 +238,7 @@ __scratch_xfs_fuzz_field_test() {
> > >  	if [ $res -eq 0 ]; then
> > >  		# Try an online scrub unless we're fuzzing ag 0's sb,
> > >  		# which scrub doesn't know how to fix.
> > > -		if [ "${repair}" != "none" ]; then
> > > +		if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then
> > 
> > Huh?
> > 
> > This changes the behavior of the repair=none fuzz tests, which mutate
> > filesystems and then write to them without running any checking tools
> > whatsoever to see if we can trip over the mutations with only regular
> > filesystem operations.  Whereas before, we'd skip xfs_scrub, now we
> > always run it if it's supported.
> 
> oops...right, we want to let the verifiers catch the errors here.
> 
> Speaking of which, I've been staring at the scripts but it's not clear
> to me how the repair=none fuzz tests are expected to function. Many of
> those tests corrupt AG metadata headers (e.g. the AGI lsn), which means
> mount bails out with an error. But the golden output doesn't account for
> that, so those tests will fail (e.g. xfs/456).

None of the dangerous_fuzz* tests have golden output.  I've thought
about posting the ones that are in my dev tree, but there's probably
not much point until I/we finish fixing the things that repair misses.
Ditto with scrub.

TBH I wrote all these field fuzzers solely as a means to fuzz things
systematically, and didn't think too hard about using them as regression
tests.  Back when I introduced the dangerous_norepair tests, it was
success enough if the kernel didn't crash.  I /think/ we've tightened
things up enough that it's time to move on to more careful checking for
runtime errors.

> Further, for things like inode metadata fuzzing where the fs is usually
> mountable, the tests will always succeed irrespective of the verifiers
> firing or not (e.g. xfs/465).
> 
> I'd assume all those repair=none tests would need to check dmesg for
> metadata corruptions, so something like:
> 
> --- a/common/fuzzy
> +++ b/common/fuzzy
> @@ -254,3 +254,3 @@ __scratch_xfs_fuzz_field_test() {
>  		__scratch_xfs_fuzz_unmount
> -	else
> +	elif [ "${repair}" != "none" ]; then
>  		(>&2 echo "re-mount failed ($res) with ${field} = ${fuzzverb}.")

Hmm, seems reasonable.  The _scratch_fuzz_modify helper probably needs
to be modified to complain if the fs writes actually succeed.

> --- a/tests/xfs/456
> +++ b/tests/xfs/456
> @@ -43,2 +43,5 @@ echo "Done fuzzing AGI"
> 
> +_check_dmesg_for "Metadata corruption detected" || \
> +	_fail "Missing metadata corruption messages!"

I'd put this at the end of __scratch_xfs_fuzz_field_test, since there
are dozens of tests that use this function, not just xfs/456.

The long term goal is to make all the corruption bailouts report
themselves to the online health system so that userspace can query the
filesystem status (via xfs_spaceman) without having to scrape dmesg.

--D

> +
>  # success, all done
> 
> If this makes sense, I'll send a separate patch to address it, and fix
> all repair=none tests as above.
> 
> > The open-coded repair conditionals scattered through this funciton
> > probably ought to be refactored into helpers, e.g.
> > 
> > __fuzz_want_scrub_check() {
> > 	local repair="$1"
> > 	local field="$2"
> > 
> > 	test "${repair}" != "none" && \
> > 		_supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}" && \
> > 		[ "${field}" != "sb 0" ]
> > }
> > 
> > if [ $res -eq 0 ]; then
> > 	# Try an online scrub...
> > 	if __fuzz_want_scrub_check "${repair}" "${field}"; then
> > 		_scratch_scrub -n -a 1 -e continue 2>&1
> > 		...
> 
> Will do that and send a v2, thanks for the review!
diff mbox series

Patch

diff --git a/common/fuzzy b/common/fuzzy
index 988203b1..83ddc3e8 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -238,7 +238,7 @@  __scratch_xfs_fuzz_field_test() {
 	if [ $res -eq 0 ]; then
 		# Try an online scrub unless we're fuzzing ag 0's sb,
 		# which scrub doesn't know how to fix.
-		if [ "${repair}" != "none" ]; then
+		if _supports_xfs_scrub "${SCRATCH_MNT}" "${SCRATCH_DEV}"; then
 			echo "++ Online scrub"
 			if [ "$1" != "sb 0" ]; then
 				_scratch_scrub -n -e continue 2>&1