diff mbox series

[v2,3/3] common/rc: Check call order of _require_dm_target and _require_scratch*

Message ID 20210908083715.1831067-4-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series fstests: Fix order of _require_scratch* and _require_dm_target | expand

Commit Message

Shin'ichiro Kawasaki Sept. 8, 2021, 8:37 a.m. UTC
When SCRATCH_DEV is not set and the test case does not call
_require_scratch* before _require_dm_target, _require_block_device
called from _require_dm_target fails to evaluate SCRATCH_DEV and
results in the test case failure. This failure reason is not described
in the error message and it takes some time to catch.

To catch the failure reason easier, check SCRATCH_DEV in
_require_dm_target. If SCRATCH_DEV is not set, fail the test case
and print message which requests to fix call order of _require_scratch*
and _require_dm_target. This improvement follows what _scratch_shutdown
does for _require_scratch_shutdown.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 common/rc | 3 +++
 1 file changed, 3 insertions(+)

Comments

Dave Chinner Sept. 10, 2021, 12:48 a.m. UTC | #1
On Wed, Sep 08, 2021 at 05:37:15PM +0900, Shin'ichiro Kawasaki wrote:
> When SCRATCH_DEV is not set and the test case does not call
> _require_scratch* before _require_dm_target, _require_block_device
> called from _require_dm_target fails to evaluate SCRATCH_DEV and
> results in the test case failure. This failure reason is not described
> in the error message and it takes some time to catch.

You should quote the actual failure message here so we have some
idea of whether the message that was emitted was appropriate or not
without having to go know how the test failed...

> To catch the failure reason easier, check SCRATCH_DEV in
> _require_dm_target. If SCRATCH_DEV is not set, fail the test case
> and print message which requests to fix call order of _require_scratch*
> and _require_dm_target. This improvement follows what _scratch_shutdown
> does for _require_scratch_shutdown.

Also, you don't need to describe the change in the commit message -
the patch does that. The first paragraph is all that is needed here
as it describes why you want to make the change.

> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  common/rc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index dda5da06..cbec8aaa 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1971,6 +1971,9 @@ _require_dm_target()
>  
>  	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
>  	# behaviour
> +	if [ -z "$SCRATCH_DEV" ]; then
> +		_fail "_require_dm_target: call _require_scratch* first in test"
> +	fi
>  	_require_block_device $SCRATCH_DEV
>  	_require_sane_bdev_flush $SCRATCH_DEV
>  	_require_command "$DMSETUP_PROG" dmsetup

That's a notrun case, not a fail.

Also, we report the error that has occurred, not how to resolve the
problem. That's because we might change behaviour in future and now
the error message tells people to do something that is
wrong/non-existent. As such, I think the premise this change is based
on is not really valid - people running fstests are assumed to have
a level of knowledge sufficient to trace a failing test and
determine what went wrong from the error reported. i.e. the error
message should state what the problem was, not describe a potential
solution.

Also, this is not the place to check if SCRATCH_DEV is set. The
check for a NULL device should be in _require_block_device(). Oh,
wait, it already is:

_require_block_device()
{
	if [ -z "$1" ]; then
		echo "Usage: _require_block_device <dev>" 1>&2
		exit 1
	fi
....
}

And that's the error message the test emitted that you didn't
understand, right?

If so, the change here should really be to _require_block_device().
i.e.

	if [ -z "$1" ]; then
		_notrun "test requires a block device to be specified"
	fi

A quick scan shows a bunch of similar _requires checks that do
similar things with poor error messages and 'exit 1' (e.g.
_require_local_device()). _requires rules should call _notrun if the
test should not run because of incorrect setup, not 'exit 1'.

Cheers,

Dave.
Shin'ichiro Kawasaki Sept. 10, 2021, 6:34 a.m. UTC | #2
On Sep 10, 2021 / 10:48, Dave Chinner wrote:
> On Wed, Sep 08, 2021 at 05:37:15PM +0900, Shin'ichiro Kawasaki wrote:
> > When SCRATCH_DEV is not set and the test case does not call
> > _require_scratch* before _require_dm_target, _require_block_device
> > called from _require_dm_target fails to evaluate SCRATCH_DEV and
> > results in the test case failure. This failure reason is not described
> > in the error message and it takes some time to catch.
> 
> You should quote the actual failure message here so we have some
> idea of whether the message that was emitted was appropriate or not
> without having to go know how the test failed...

Sorry about the lack of the infomration. As you found below, the meesage was
"Usage: _require_block_device <dev>".

> 
> > To catch the failure reason easier, check SCRATCH_DEV in
> > _require_dm_target. If SCRATCH_DEV is not set, fail the test case
> > and print message which requests to fix call order of _require_scratch*
> > and _require_dm_target. This improvement follows what _scratch_shutdown
> > does for _require_scratch_shutdown.
> 
> Also, you don't need to describe the change in the commit message -
> the patch does that. The first paragraph is all that is needed here
> as it describes why you want to make the change.

I see. I will write "why" in the commit message, not "what". (In the past, I
was advised to write "what" the patch does, but I think this guide is valid
only when the change is complicated).

> 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  common/rc | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index dda5da06..cbec8aaa 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -1971,6 +1971,9 @@ _require_dm_target()
> >  
> >  	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
> >  	# behaviour
> > +	if [ -z "$SCRATCH_DEV" ]; then
> > +		_fail "_require_dm_target: call _require_scratch* first in test"
> > +	fi
> >  	_require_block_device $SCRATCH_DEV
> >  	_require_sane_bdev_flush $SCRATCH_DEV
> >  	_require_command "$DMSETUP_PROG" dmsetup
> 
> That's a notrun case, not a fail.
> 
> Also, we report the error that has occurred, not how to resolve the
> problem. That's because we might change behaviour in future and now
> the error message tells people to do something that is
> wrong/non-existent. As such, I think the premise this change is based
> on is not really valid - people running fstests are assumed to have
> a level of knowledge sufficient to trace a failing test and
> determine what went wrong from the error reported. i.e. the error
> message should state what the problem was, not describe a potential
> solution.

Thank you for the comment. These are the points I missed. At least I was
able to catch the cause, so the improvement I suggested is not a big
improvement.

> 
> Also, this is not the place to check if SCRATCH_DEV is set. The
> check for a NULL device should be in _require_block_device(). Oh,
> wait, it already is:
> 
> _require_block_device()
> {
> 	if [ -z "$1" ]; then
> 		echo "Usage: _require_block_device <dev>" 1>&2
> 		exit 1
> 	fi
> ....
> }
> 
> And that's the error message the test emitted that you didn't
> understand, right?

Right :)

> 
> If so, the change here should really be to _require_block_device().
> i.e.
> 
> 	if [ -z "$1" ]; then
> 		_notrun "test requires a block device to be specified"
> 	fi
> 
> A quick scan shows a bunch of similar _requires checks that do
> similar things with poor error messages and 'exit 1' (e.g.
> _require_local_device()). _requires rules should call _notrun if the
> test should not run because of incorrect setup, not 'exit 1'.

Thank you for your thoughts. I walked through _require_* bash functions in
common/, and listed 20 functions below, which call 'exit 1', _fail, or
'return 1' for its argument check failure:

--- list start ---

common/rc

  _require_scratch_size
  _require_scratch_size_nocheck
  _require_command *
  _require_block_device *
  _require_local_device *
  _require_zoned_device *
  _require_non_zoned_device *
  _require_scratch_ext4_feature
  _require_xfs_io_command
  _require_fio
  _require_batched_discard *
  _require_chattr
  _require_fs_sysfs
  _require_scratch_feature

common/btrfs

  _require_btrfs_mkfs_feature
  _require_btrfs_fs_feature

common/xfs

  _require_xfs_db_command
  _require_xfs_spaceman_command

common/encrypt

  _require_encryption_policy_support (checks arguments passed from _require_scratch_encryption)

common/rnameat2

  _require_renameat2

--- list end ---

Many of the functions above check arguments not for incorrect setup, but for
call in test cases with invalid arguments. 6 functions of them with * in the
list check arguments for the incorrect setups, such as DEBUGFS_PROG,
SCRATCH_DEV or SCRATCH_MNT. So I suggest to modify these functions to improve
error messages and call "_notrun". What do you think about this?
Eryu Guan Sept. 12, 2021, 9:17 a.m. UTC | #3
On Fri, Sep 10, 2021 at 06:34:05AM +0000, Shinichiro Kawasaki wrote:
> On Sep 10, 2021 / 10:48, Dave Chinner wrote:
> > On Wed, Sep 08, 2021 at 05:37:15PM +0900, Shin'ichiro Kawasaki wrote:
> > > When SCRATCH_DEV is not set and the test case does not call
> > > _require_scratch* before _require_dm_target, _require_block_device
> > > called from _require_dm_target fails to evaluate SCRATCH_DEV and
> > > results in the test case failure. This failure reason is not described
> > > in the error message and it takes some time to catch.
> > 
> > You should quote the actual failure message here so we have some
> > idea of whether the message that was emitted was appropriate or not
> > without having to go know how the test failed...
> 
> Sorry about the lack of the infomration. As you found below, the meesage was
> "Usage: _require_block_device <dev>".
> 
> > 
> > > To catch the failure reason easier, check SCRATCH_DEV in
> > > _require_dm_target. If SCRATCH_DEV is not set, fail the test case
> > > and print message which requests to fix call order of _require_scratch*
> > > and _require_dm_target. This improvement follows what _scratch_shutdown
> > > does for _require_scratch_shutdown.
> > 
> > Also, you don't need to describe the change in the commit message -
> > the patch does that. The first paragraph is all that is needed here
> > as it describes why you want to make the change.
> 
> I see. I will write "why" in the commit message, not "what". (In the past, I
> was advised to write "what" the patch does, but I think this guide is valid
> only when the change is complicated).
> 
> > 
> > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > ---
> > >  common/rc | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index dda5da06..cbec8aaa 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -1971,6 +1971,9 @@ _require_dm_target()
> > >  
> > >  	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
> > >  	# behaviour
> > > +	if [ -z "$SCRATCH_DEV" ]; then
> > > +		_fail "_require_dm_target: call _require_scratch* first in test"
> > > +	fi
> > >  	_require_block_device $SCRATCH_DEV
> > >  	_require_sane_bdev_flush $SCRATCH_DEV
> > >  	_require_command "$DMSETUP_PROG" dmsetup
> > 
> > That's a notrun case, not a fail.
> > 
> > Also, we report the error that has occurred, not how to resolve the
> > problem. That's because we might change behaviour in future and now
> > the error message tells people to do something that is
> > wrong/non-existent. As such, I think the premise this change is based
> > on is not really valid - people running fstests are assumed to have
> > a level of knowledge sufficient to trace a failing test and
> > determine what went wrong from the error reported. i.e. the error
> > message should state what the problem was, not describe a potential
> > solution.
> 
> Thank you for the comment. These are the points I missed. At least I was
> able to catch the cause, so the improvement I suggested is not a big
> improvement.
> 
> > 
> > Also, this is not the place to check if SCRATCH_DEV is set. The
> > check for a NULL device should be in _require_block_device(). Oh,
> > wait, it already is:
> > 
> > _require_block_device()
> > {
> > 	if [ -z "$1" ]; then
> > 		echo "Usage: _require_block_device <dev>" 1>&2
> > 		exit 1
> > 	fi
> > ....
> > }
> > 
> > And that's the error message the test emitted that you didn't
> > understand, right?
> 
> Right :)
> 
> > 
> > If so, the change here should really be to _require_block_device().
> > i.e.
> > 
> > 	if [ -z "$1" ]; then
> > 		_notrun "test requires a block device to be specified"
> > 	fi
> > 
> > A quick scan shows a bunch of similar _requires checks that do
> > similar things with poor error messages and 'exit 1' (e.g.
> > _require_local_device()). _requires rules should call _notrun if the
> > test should not run because of incorrect setup, not 'exit 1'.
> 
> Thank you for your thoughts. I walked through _require_* bash functions in
> common/, and listed 20 functions below, which call 'exit 1', _fail, or
> 'return 1' for its argument check failure:
> 
> --- list start ---
> 
> common/rc
> 
>   _require_scratch_size
>   _require_scratch_size_nocheck
>   _require_command *
>   _require_block_device *
>   _require_local_device *
>   _require_zoned_device *
>   _require_non_zoned_device *
>   _require_scratch_ext4_feature
>   _require_xfs_io_command
>   _require_fio
>   _require_batched_discard *
>   _require_chattr
>   _require_fs_sysfs
>   _require_scratch_feature
> 
> common/btrfs
> 
>   _require_btrfs_mkfs_feature
>   _require_btrfs_fs_feature
> 
> common/xfs
> 
>   _require_xfs_db_command
>   _require_xfs_spaceman_command
> 
> common/encrypt
> 
>   _require_encryption_policy_support (checks arguments passed from _require_scratch_encryption)
> 
> common/rnameat2
> 
>   _require_renameat2
> 
> --- list end ---
> 
> Many of the functions above check arguments not for incorrect setup, but for
> call in test cases with invalid arguments. 6 functions of them with * in the
> list check arguments for the incorrect setups, such as DEBUGFS_PROG,
> SCRATCH_DEV or SCRATCH_MNT. So I suggest to modify these functions to improve
> error messages and call "_notrun". What do you think about this?

IMO the _fail calls in above _require* rules are indicating function
usage errors, which are bugs in the test code. While _notrun indicates a
required condition is not met for this test.

Thanks,
Eryu

P.S. I've applied the first two patches, thanks for the fix!

> 
> -- 
> Best Regards,
> Shin'ichiro Kawasaki
Shin'ichiro Kawasaki Sept. 12, 2021, 11:28 p.m. UTC | #4
On Sep 12, 2021 / 17:17, Eryu Guan wrote:
> On Fri, Sep 10, 2021 at 06:34:05AM +0000, Shinichiro Kawasaki wrote:
> > On Sep 10, 2021 / 10:48, Dave Chinner wrote:
> > > On Wed, Sep 08, 2021 at 05:37:15PM +0900, Shin'ichiro Kawasaki wrote:
> > > > When SCRATCH_DEV is not set and the test case does not call
> > > > _require_scratch* before _require_dm_target, _require_block_device
> > > > called from _require_dm_target fails to evaluate SCRATCH_DEV and
> > > > results in the test case failure. This failure reason is not described
> > > > in the error message and it takes some time to catch.
> > > 
> > > You should quote the actual failure message here so we have some
> > > idea of whether the message that was emitted was appropriate or not
> > > without having to go know how the test failed...
> > 
> > Sorry about the lack of the infomration. As you found below, the meesage was
> > "Usage: _require_block_device <dev>".
> > 
> > > 
> > > > To catch the failure reason easier, check SCRATCH_DEV in
> > > > _require_dm_target. If SCRATCH_DEV is not set, fail the test case
> > > > and print message which requests to fix call order of _require_scratch*
> > > > and _require_dm_target. This improvement follows what _scratch_shutdown
> > > > does for _require_scratch_shutdown.
> > > 
> > > Also, you don't need to describe the change in the commit message -
> > > the patch does that. The first paragraph is all that is needed here
> > > as it describes why you want to make the change.
> > 
> > I see. I will write "why" in the commit message, not "what". (In the past, I
> > was advised to write "what" the patch does, but I think this guide is valid
> > only when the change is complicated).
> > 
> > > 
> > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > > ---
> > > >  common/rc | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/common/rc b/common/rc
> > > > index dda5da06..cbec8aaa 100644
> > > > --- a/common/rc
> > > > +++ b/common/rc
> > > > @@ -1971,6 +1971,9 @@ _require_dm_target()
> > > >  
> > > >  	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
> > > >  	# behaviour
> > > > +	if [ -z "$SCRATCH_DEV" ]; then
> > > > +		_fail "_require_dm_target: call _require_scratch* first in test"
> > > > +	fi
> > > >  	_require_block_device $SCRATCH_DEV
> > > >  	_require_sane_bdev_flush $SCRATCH_DEV
> > > >  	_require_command "$DMSETUP_PROG" dmsetup
> > > 
> > > That's a notrun case, not a fail.
> > > 
> > > Also, we report the error that has occurred, not how to resolve the
> > > problem. That's because we might change behaviour in future and now
> > > the error message tells people to do something that is
> > > wrong/non-existent. As such, I think the premise this change is based
> > > on is not really valid - people running fstests are assumed to have
> > > a level of knowledge sufficient to trace a failing test and
> > > determine what went wrong from the error reported. i.e. the error
> > > message should state what the problem was, not describe a potential
> > > solution.
> > 
> > Thank you for the comment. These are the points I missed. At least I was
> > able to catch the cause, so the improvement I suggested is not a big
> > improvement.
> > 
> > > 
> > > Also, this is not the place to check if SCRATCH_DEV is set. The
> > > check for a NULL device should be in _require_block_device(). Oh,
> > > wait, it already is:
> > > 
> > > _require_block_device()
> > > {
> > > 	if [ -z "$1" ]; then
> > > 		echo "Usage: _require_block_device <dev>" 1>&2
> > > 		exit 1
> > > 	fi
> > > ....
> > > }
> > > 
> > > And that's the error message the test emitted that you didn't
> > > understand, right?
> > 
> > Right :)
> > 
> > > 
> > > If so, the change here should really be to _require_block_device().
> > > i.e.
> > > 
> > > 	if [ -z "$1" ]; then
> > > 		_notrun "test requires a block device to be specified"
> > > 	fi
> > > 
> > > A quick scan shows a bunch of similar _requires checks that do
> > > similar things with poor error messages and 'exit 1' (e.g.
> > > _require_local_device()). _requires rules should call _notrun if the
> > > test should not run because of incorrect setup, not 'exit 1'.
> > 
> > Thank you for your thoughts. I walked through _require_* bash functions in
> > common/, and listed 20 functions below, which call 'exit 1', _fail, or
> > 'return 1' for its argument check failure:
> > 
> > --- list start ---
> > 
> > common/rc
> > 
> >   _require_scratch_size
> >   _require_scratch_size_nocheck
> >   _require_command *
> >   _require_block_device *
> >   _require_local_device *
> >   _require_zoned_device *
> >   _require_non_zoned_device *
> >   _require_scratch_ext4_feature
> >   _require_xfs_io_command
> >   _require_fio
> >   _require_batched_discard *
> >   _require_chattr
> >   _require_fs_sysfs
> >   _require_scratch_feature
> > 
> > common/btrfs
> > 
> >   _require_btrfs_mkfs_feature
> >   _require_btrfs_fs_feature
> > 
> > common/xfs
> > 
> >   _require_xfs_db_command
> >   _require_xfs_spaceman_command
> > 
> > common/encrypt
> > 
> >   _require_encryption_policy_support (checks arguments passed from _require_scratch_encryption)
> > 
> > common/rnameat2
> > 
> >   _require_renameat2
> > 
> > --- list end ---
> > 
> > Many of the functions above check arguments not for incorrect setup, but for
> > call in test cases with invalid arguments. 6 functions of them with * in the
> > list check arguments for the incorrect setups, such as DEBUGFS_PROG,
> > SCRATCH_DEV or SCRATCH_MNT. So I suggest to modify these functions to improve
> > error messages and call "_notrun". What do you think about this?
> 
> IMO the _fail calls in above _require* rules are indicating function
> usage errors, which are bugs in the test code. While _notrun indicates a
> required condition is not met for this test.

I see. I think the _require* rules with "exit 1" also indicates the usage errors
and the bugs. As Dave pointed out, it is assumed the fstests users have enough
skill to identify the bug, then this improvement I suggested don't have much
value. I withdraw this suggestion. Dave and Eryu, thank you for the comments.

> 
> Thanks,
> Eryu
> 
> P.S. I've applied the first two patches, thanks for the fix!

Thanks!
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index dda5da06..cbec8aaa 100644
--- a/common/rc
+++ b/common/rc
@@ -1971,6 +1971,9 @@  _require_dm_target()
 
 	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
 	# behaviour
+	if [ -z "$SCRATCH_DEV" ]; then
+		_fail "_require_dm_target: call _require_scratch* first in test"
+	fi
 	_require_block_device $SCRATCH_DEV
 	_require_sane_bdev_flush $SCRATCH_DEV
 	_require_command "$DMSETUP_PROG" dmsetup