diff mbox

[v2,3/3] fstests: run xfs_io as multi threaded for 'quick' tests

Message ID 1476615222-7804-3-git-send-email-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein Oct. 16, 2016, 10:53 a.m. UTC
Try to run xfs_io for tests in group quick with command line
option -M which starts an idle thread before performing any io.

The purpose of this idle thread is to test io from a multi threaded
process. With single threaded process, the file table is not shared
and file structs are not reference counted.

In order to improve the chance of detecting file struct reference
leaks, we should run xfs_io commands with this option as much as
possible.

Analysis of the effect of xfs_io -M on tests runtime showed that
it may lead to slightly longer run times in extreme cases (e.g +3s
for generic/132), but has a negligable effect on runtime of tests
among the 'quick' group (worst case +0.3s for generic/130).

Therefore, we automatically add the -M flags only to tests in the
'quick' group.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 check     |  2 ++
 common/rc | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

Comments

Dave Chinner Oct. 16, 2016, 9:46 p.m. UTC | #1
On Sun, Oct 16, 2016 at 01:53:42PM +0300, Amir Goldstein wrote:
> Try to run xfs_io for tests in group quick with command line
> option -M which starts an idle thread before performing any io.
> 
> The purpose of this idle thread is to test io from a multi threaded
> process. With single threaded process, the file table is not shared
> and file structs are not reference counted.
> 
> In order to improve the chance of detecting file struct reference
> leaks, we should run xfs_io commands with this option as much as
> possible.
> 
> Analysis of the effect of xfs_io -M on tests runtime showed that
> it may lead to slightly longer run times in extreme cases (e.g +3s
> for generic/132), but has a negligable effect on runtime of tests
> among the 'quick' group (worst case +0.3s for generic/130).
> 
> Therefore, we automatically add the -M flags only to tests in the
> 'quick' group.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  check     |  2 ++
>  common/rc | 15 +++++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/check b/check
> index 69341d8..e568598 100755
> --- a/check
> +++ b/check
> @@ -574,6 +574,8 @@ for section in $HOST_OPTIONS_SECTIONS; do
>  
>  	    mkdir -p $RESULT_DIR
>  
> +	    export TEST_GROUPS=`grep $(basename $seqnum) "$SRC_DIR/$(dirname $seqnum)/group"`
> +

Do we really want to go down this path of changing behaviours
of utilities based on what group  they belong to? It means we can no
longer look at the test code and recreate the command line without
having to work out what group context the test is running under. And
how do we tell that it's correct and we don't inadvertantly break
it? I ask this because.....

>  	    echo -n "$seqnum"
>  
>  		if $showme; then
> diff --git a/common/rc b/common/rc
> index a838750..7c478cf 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3799,6 +3799,21 @@ init_rc()
>  	$XFS_IO_PROG -c stat $TEST_DIR 2>&1 | grep -q "is not on an XFS filesystem" && \
>  		export XFS_IO_PROG="$XFS_IO_PROG -F"
>  
> +	if echo $TEST_GROUPS | grep -q quick; then
> +		# xfs_io -M flag runs xfs_io as multi threaded process
> +		# in order to catch fdget/fdset reference leaks, because
> +		# file structs are not reference counted in a single threaded
> +		# process.
> +		# Because reference counted fdget/fdset may lead to slightly
> +		# longer run times in extreme cases (such as generic/132),
> +		# we limit the use of -M flags to tests with short runtime,
> +		# where the effect of the flag is negligable.
> +		#
> +		# Figure out if xfs_io supports the -M option
> +		$XFS_IO_PROG -M -c quit 2>/dev/null && \
> +			export XFS_IO_PROG="$XFS_IO_PROG -M"
> +	fi
> +

.... I can't see how this works - init_rc is
called and sets $XFS_IO_PROG long before TEST_GROUPS is set and
exported. How did you verify that the correct xfs_io command lines
were being issued?

IMO, either turn it on for everything if it is supported, or make it
an explicit command line option to enable. A couple of seconds of
extra runtime here and there means nothing for a typical auto test
run which can take hours to run....

Cheers,

Dave.
Amir Goldstein Oct. 17, 2016, 7:01 a.m. UTC | #2
On Mon, Oct 17, 2016 at 12:46 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Oct 16, 2016 at 01:53:42PM +0300, Amir Goldstein wrote:
>> Try to run xfs_io for tests in group quick with command line
>> option -M which starts an idle thread before performing any io.
>>
>> The purpose of this idle thread is to test io from a multi threaded
>> process. With single threaded process, the file table is not shared
>> and file structs are not reference counted.
>>
>> In order to improve the chance of detecting file struct reference
>> leaks, we should run xfs_io commands with this option as much as
>> possible.
>>
>> Analysis of the effect of xfs_io -M on tests runtime showed that
>> it may lead to slightly longer run times in extreme cases (e.g +3s
>> for generic/132), but has a negligable effect on runtime of tests
>> among the 'quick' group (worst case +0.3s for generic/130).
>>
>> Therefore, we automatically add the -M flags only to tests in the
>> 'quick' group.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  check     |  2 ++
>>  common/rc | 15 +++++++++++++++
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/check b/check
>> index 69341d8..e568598 100755
>> --- a/check
>> +++ b/check
>> @@ -574,6 +574,8 @@ for section in $HOST_OPTIONS_SECTIONS; do
>>
>>           mkdir -p $RESULT_DIR
>>
>> +         export TEST_GROUPS=`grep $(basename $seqnum) "$SRC_DIR/$(dirname $seqnum)/group"`
>> +
>
> Do we really want to go down this path of changing behaviours
> of utilities based on what group  they belong to? It means we can no
> longer look at the test code and recreate the command line without
> having to work out what group context the test is running under. And
> how do we tell that it's correct and we don't inadvertantly break
> it? I ask this because.....
>
>>           echo -n "$seqnum"
>>
>>               if $showme; then
>> diff --git a/common/rc b/common/rc
>> index a838750..7c478cf 100644
>> --- a/common/rc
>> +++ b/common/rc
>> @@ -3799,6 +3799,21 @@ init_rc()
>>       $XFS_IO_PROG -c stat $TEST_DIR 2>&1 | grep -q "is not on an XFS filesystem" && \
>>               export XFS_IO_PROG="$XFS_IO_PROG -F"
>>
>> +     if echo $TEST_GROUPS | grep -q quick; then
>> +             # xfs_io -M flag runs xfs_io as multi threaded process
>> +             # in order to catch fdget/fdset reference leaks, because
>> +             # file structs are not reference counted in a single threaded
>> +             # process.
>> +             # Because reference counted fdget/fdset may lead to slightly
>> +             # longer run times in extreme cases (such as generic/132),
>> +             # we limit the use of -M flags to tests with short runtime,
>> +             # where the effect of the flag is negligable.
>> +             #
>> +             # Figure out if xfs_io supports the -M option
>> +             $XFS_IO_PROG -M -c quit 2>/dev/null && \
>> +                     export XFS_IO_PROG="$XFS_IO_PROG -M"
>> +     fi
>> +
>
> .... I can't see how this works - init_rc is
> called and sets $XFS_IO_PROG long before TEST_GROUPS is set and
> exported.

Correct, but common/rc is then included again from every single test,
so XFS_IO_PROG is adjusted to add the -M/i flag per test after
TEST_GROUPS is set for a specific test in ./check.
I saw the -F flag code and figured that's a good place to add -M/i as well.
With a hunch, I would say that if -F where to be added, it where to be
added twice.

> How did you verify that the correct xfs_io command lines
> were being issued?
>

Both by adding debug print and by monitoring ps

> IMO, either turn it on for everything if it is supported, or make it
> an explicit command line option to enable.

If I make the default off, then not enough people will test for reference leaks.
Since leaks may be hard to find, because they may only be on rare error path,
the motivation for a test suit IMO is to enforce this option as much
as possible.

OTOH, if everyone, always run with -i, then the fast path fdget is
never tested with
xfs_io. fast path will get tested with other tools used by tests
though, so maybe
that is not such a big concern.

> A couple of seconds of
> extra runtime here and there means nothing for a typical auto test
> run which can take hours to run....
>

In that case, I think we should turn the flag always if supported.
After all, even it tests take longer to run that's exactly the sort of penalty
that should be acceptable to pay for running with extra debugging options
enabled.

As an extra, I suggested we could turn the flag off for people running the
test with non debug kernel, or say, without CONFIG_PROVE_LOCKING.
If there are people doing that, and I imagine there are, they may care more
about test runtime and less about detecting leaks?

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Oct. 17, 2016, 9:49 p.m. UTC | #3
On Mon, Oct 17, 2016 at 10:01 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Oct 17, 2016 at 12:46 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Sun, Oct 16, 2016 at 01:53:42PM +0300, Amir Goldstein wrote:
>>> Try to run xfs_io for tests in group quick with command line
>>> option -M which starts an idle thread before performing any io.
>>>
>>> The purpose of this idle thread is to test io from a multi threaded
>>> process. With single threaded process, the file table is not shared
>>> and file structs are not reference counted.
>>>
>>> In order to improve the chance of detecting file struct reference
>>> leaks, we should run xfs_io commands with this option as much as
>>> possible.
>>>
>>> Analysis of the effect of xfs_io -M on tests runtime showed that
>>> it may lead to slightly longer run times in extreme cases (e.g +3s
>>> for generic/132), but has a negligable effect on runtime of tests
>>> among the 'quick' group (worst case +0.3s for generic/130).
>>>
>>> Therefore, we automatically add the -M flags only to tests in the
>>> 'quick' group.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---

...

>> A couple of seconds of
>> extra runtime here and there means nothing for a typical auto test
>> run which can take hours to run....
>>
>

Just to follow up on the extra runtime.
I did one auto test run at 1h42m without -i flag
and then auto test run at 1h45m with -i flag
I did not repeat several runs to reduce noise.
If anyone thinks that difference is too expensive please shout.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan Oct. 20, 2016, 2:25 p.m. UTC | #4
On Mon, Oct 17, 2016 at 10:01:19AM +0300, Amir Goldstein wrote:
> On Mon, Oct 17, 2016 at 12:46 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sun, Oct 16, 2016 at 01:53:42PM +0300, Amir Goldstein wrote:
> >> Try to run xfs_io for tests in group quick with command line
> >> option -M which starts an idle thread before performing any io.
> >>
> >> The purpose of this idle thread is to test io from a multi threaded
> >> process. With single threaded process, the file table is not shared
> >> and file structs are not reference counted.
> >>
> >> In order to improve the chance of detecting file struct reference
> >> leaks, we should run xfs_io commands with this option as much as
> >> possible.
> >>
> >> Analysis of the effect of xfs_io -M on tests runtime showed that
> >> it may lead to slightly longer run times in extreme cases (e.g +3s
> >> for generic/132), but has a negligable effect on runtime of tests
> >> among the 'quick' group (worst case +0.3s for generic/130).
> >>
> >> Therefore, we automatically add the -M flags only to tests in the
> >> 'quick' group.
> >>
> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> ---
> >>  check     |  2 ++
> >>  common/rc | 15 +++++++++++++++
> >>  2 files changed, 17 insertions(+)
> >>
> >> diff --git a/check b/check
> >> index 69341d8..e568598 100755
> >> --- a/check
> >> +++ b/check
> >> @@ -574,6 +574,8 @@ for section in $HOST_OPTIONS_SECTIONS; do
> >>
> >>           mkdir -p $RESULT_DIR
> >>
> >> +         export TEST_GROUPS=`grep $(basename $seqnum) "$SRC_DIR/$(dirname $seqnum)/group"`
> >> +
> >
> > Do we really want to go down this path of changing behaviours
> > of utilities based on what group  they belong to? It means we can no
> > longer look at the test code and recreate the command line without
> > having to work out what group context the test is running under. And
> > how do we tell that it's correct and we don't inadvertantly break
> > it? I ask this because.....
> >
> >>           echo -n "$seqnum"
> >>
> >>               if $showme; then
> >> diff --git a/common/rc b/common/rc
> >> index a838750..7c478cf 100644
> >> --- a/common/rc
> >> +++ b/common/rc
> >> @@ -3799,6 +3799,21 @@ init_rc()
> >>       $XFS_IO_PROG -c stat $TEST_DIR 2>&1 | grep -q "is not on an XFS filesystem" && \
> >>               export XFS_IO_PROG="$XFS_IO_PROG -F"
> >>
> >> +     if echo $TEST_GROUPS | grep -q quick; then
> >> +             # xfs_io -M flag runs xfs_io as multi threaded process
> >> +             # in order to catch fdget/fdset reference leaks, because
> >> +             # file structs are not reference counted in a single threaded
> >> +             # process.
> >> +             # Because reference counted fdget/fdset may lead to slightly
> >> +             # longer run times in extreme cases (such as generic/132),
> >> +             # we limit the use of -M flags to tests with short runtime,
> >> +             # where the effect of the flag is negligable.
> >> +             #
> >> +             # Figure out if xfs_io supports the -M option
> >> +             $XFS_IO_PROG -M -c quit 2>/dev/null && \
> >> +                     export XFS_IO_PROG="$XFS_IO_PROG -M"
> >> +     fi
> >> +
> >
> > .... I can't see how this works - init_rc is
> > called and sets $XFS_IO_PROG long before TEST_GROUPS is set and
> > exported.
> 
> Correct, but common/rc is then included again from every single test,
> so XFS_IO_PROG is adjusted to add the -M/i flag per test after
> TEST_GROUPS is set for a specific test in ./check.
> I saw the -F flag code and figured that's a good place to add -M/i as well.
> With a hunch, I would say that if -F where to be added, it where to be
> added twice.
> 
> > How did you verify that the correct xfs_io command lines
> > were being issued?
> >
> 
> Both by adding debug print and by monitoring ps
> 
> > IMO, either turn it on for everything if it is supported, or make it
> > an explicit command line option to enable.
> 
> If I make the default off, then not enough people will test for reference leaks.
> Since leaks may be hard to find, because they may only be on rare error path,
> the motivation for a test suit IMO is to enforce this option as much
> as possible.
> 
> OTOH, if everyone, always run with -i, then the fast path fdget is
> never tested with
> xfs_io. fast path will get tested with other tools used by tests
> though, so maybe
> that is not such a big concern.

[Sorry for the late response, I'm travelling this week and the network
connection is not as good as I expected.]

I'm still having concerns about losing test coverage by enabling "-i" by
default. How about adding an command line option to disable it? So at
least we could have a way to turn it off. Or is it completely impossible
to lose any test coverage?

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Oct. 20, 2016, 6:27 p.m. UTC | #5
On Thu, Oct 20, 2016 at 5:25 PM, Eryu Guan <eguan@redhat.com> wrote:
> On Mon, Oct 17, 2016 at 10:01:19AM +0300, Amir Goldstein wrote:

...

>
> I'm still having concerns about losing test coverage by enabling "-i" by
> default. How about adding an command line option to disable it?

I could take the easy way and add a command line option to satisfy
concerns, but we all know that 99% of the time, nobody is going to use it,
so we will be just silencing out concerns instead of addressing them.

> So at
> least we could have a way to turn it off. Or is it completely impossible
> to lose any test coverage?

I would not argue that it is completely impossible to loose test coverage
but I would argue that there is low probability of loosing *interesting* test
coverage.

Here is the argument I am making:
A file reference leak can be anywhere, (e.g in EXDEV error path of clone
file range ioctl) so we SHOULD have -i test coverage for as many APIs
as possible with as many arguments as possible.
OTOH, the difference between slowpath and fastpath of fdget()/fdput()
for single/multi threaded process is quite invariant to the specific API,
so I do not see the value in test coverage of fastpath to all APIs.

Anyway, I'd be happy to add a command line option or env var if that is
the preference of folks, but then I would really appreciate a suggestion
for option name or var name, because it is going to be quite hard to
find an intuitive name for this test variant.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan Oct. 21, 2016, 3:24 p.m. UTC | #6
On Thu, Oct 20, 2016 at 09:27:03PM +0300, Amir Goldstein wrote:
> On Thu, Oct 20, 2016 at 5:25 PM, Eryu Guan <eguan@redhat.com> wrote:
> > On Mon, Oct 17, 2016 at 10:01:19AM +0300, Amir Goldstein wrote:
> 
> ...
> 
> >
> > I'm still having concerns about losing test coverage by enabling "-i" by
> > default. How about adding an command line option to disable it?
> 
> I could take the easy way and add a command line option to satisfy
> concerns, but we all know that 99% of the time, nobody is going to use it,
> so we will be just silencing out concerns instead of addressing them.
> 
> > So at
> > least we could have a way to turn it off. Or is it completely impossible
> > to lose any test coverage?
> 
> I would not argue that it is completely impossible to loose test coverage
> but I would argue that there is low probability of loosing *interesting* test
> coverage.
> 
> Here is the argument I am making:
> A file reference leak can be anywhere, (e.g in EXDEV error path of clone
> file range ioctl) so we SHOULD have -i test coverage for as many APIs
> as possible with as many arguments as possible.
> OTOH, the difference between slowpath and fastpath of fdget()/fdput()
> for single/multi threaded process is quite invariant to the specific API,
> so I do not see the value in test coverage of fastpath to all APIs.

Then I'll take it as it is, and we can always add a new option to turn
it off when we find it necessary :) Thanks!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/check b/check
index 69341d8..e568598 100755
--- a/check
+++ b/check
@@ -574,6 +574,8 @@  for section in $HOST_OPTIONS_SECTIONS; do
 
 	    mkdir -p $RESULT_DIR
 
+	    export TEST_GROUPS=`grep $(basename $seqnum) "$SRC_DIR/$(dirname $seqnum)/group"`
+
 	    echo -n "$seqnum"
 
 		if $showme; then
diff --git a/common/rc b/common/rc
index a838750..7c478cf 100644
--- a/common/rc
+++ b/common/rc
@@ -3799,6 +3799,21 @@  init_rc()
 	$XFS_IO_PROG -c stat $TEST_DIR 2>&1 | grep -q "is not on an XFS filesystem" && \
 		export XFS_IO_PROG="$XFS_IO_PROG -F"
 
+	if echo $TEST_GROUPS | grep -q quick; then
+		# xfs_io -M flag runs xfs_io as multi threaded process
+		# in order to catch fdget/fdset reference leaks, because
+		# file structs are not reference counted in a single threaded
+		# process.
+		# Because reference counted fdget/fdset may lead to slightly
+		# longer run times in extreme cases (such as generic/132),
+		# we limit the use of -M flags to tests with short runtime,
+		# where the effect of the flag is negligable.
+		#
+		# Figure out if xfs_io supports the -M option
+		$XFS_IO_PROG -M -c quit 2>/dev/null && \
+			export XFS_IO_PROG="$XFS_IO_PROG -M"
+	fi
+
 	# xfs_copy doesn't work on v5 xfs yet without -d option
 	if [ "$FSTYP" == "xfs" ] && [[ $MKFS_OPTIONS =~ crc=1 ]]; then
 		export XFS_COPY_PROG="$XFS_COPY_PROG -d"