mbox series

[RFC,0/8] fstests: _cleanup() overrides are a mess

Message ID 20220524073411.1943480-1-david@fromorbit.com (mailing list archive)
Headers show
Series fstests: _cleanup() overrides are a mess | expand

Message

Dave Chinner May 24, 2022, 7:34 a.m. UTC
Hi folks,

I pulled on a string a couple of days ago, and it got out of
control. It all started when I went to kill a test with ctrl-c and
it, once again, left background processes running that I had to hunt
down and kill manually.

I then started looking a why this keeps happening, and realised that
the way we clean up on test completion is messy, inconsistent and
frequently buggy. So I started cleaning it all up, starting with the
tests/xfs directory because I saw a lot of low hanging fruit there.

Essentially, we use _cleanup() functions as a way of overriding the
default trap handler we install in _begin_fstest(). Rather than
register a new handler, we just redefine the common cleanup function
and re-implement it (poorly) in every test that does an override.
Often these overrides are completely unnecessary - I think I reduced
the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
and I reudced the number of *unique overrides by a lot more than
that.

The method for overriding changes to be "stacked cleanups" rather
than "duplicated cleanups". That is, tests no longer open code:

	cd /
	rm -rf $tmp.*

THis is what common/preamble::_cleanup() does. We should call that
function to do this. Hence if we have a local cleanup that we need
to do, it becomes:

local_cleanup()
{
	rm -f $testfile
	_cleanup
}
_register_cleanup local_cleanup

While this looks more verbose, it means we can actually reuse the
same cleanup function across lots of tests. 

A large number of xfsdump tests were all using the same override
cleanup function to call _cleanup_dump. These are all changed to:

. ./common/dump
_register_cleanup _cleanup_dump

and _cleanup_dump stacks like this:

_cleanup_dump()
{
	#do xfsdump cleanup stuff

	_cleanup
}

and we don't need to do anything else. There is one xfsdump test
that needs it's own cleanup. It stacks like this:

local_cleanup()
{
	rm -f $testfile
	_cleanup_dump
}
_register_cleanup local_cleanup

All the tests that run fsstress in the background now have a common
cleanup function that kills fsstress processes defined in
common/preamble. They just do:

_register_cleanup _cleanup_fsstress

And now every test that puts fsstress in the background behaves
correctly and kills all the background fsstree processes when
interrupted.

The conversion is by no means complete. I've named the local cleanup
functions by what they do so we can go back and derive commonality
between them. The number of different variations on tearing down
loops devices is crazy, and half of them are buggy. I haven't worked
through these yet, so you'll see lots of tests with:

_loop_cleanup()
{
	......
	_cleanup
}
_register_cleanup _loop_cleanup

That have similar but different ways of cleaning up loop devices.

I also added a _no_cleanup() function, as there are a large number
of xfs fuzzer tests that want to leave a warm corpse behind so that
debugging what just happened is easy.

I also added BUS to the default signal trap set - well over a 100
tests in tests/xfs had a line like:

_register_cleanup "_cleanup" BUS

just to add BUS signals to the set that would cause the cleanup
function to run. Just make it the default!

Overall, this significantly reduces the amount of boiler plate in
tests, and sets us down the path of common cleanup functions that
tests may not even need to define. e.g. just including
./common/dmflakey registers the _cleanup_dmflakey() trap that will
do all the necessary cleanup when the test exists. This makes the
tests simpler, more robust and reduces the maintenance burden of
over 1700 individual tests....

I won't put the full diffstat in this mail, but the benefits should
be clean from the summary:

360 files changed, 420 insertions(+), 1781 deletions(-)

I've lost count of the number of test bugs I killed in removing
all this code, and that's largely just in the tests/xfs directory.
So before I go spend another couple of days on converting the rest
of fstests, I figured I better make sure everyone is OK with these
changes.

Thoughts, comments?

-Dave.

Comments

Amir Goldstein May 24, 2022, 8:29 a.m. UTC | #1
On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
>
> Hi folks,
>
> I pulled on a string a couple of days ago, and it got out of
> control. It all started when I went to kill a test with ctrl-c and
> it, once again, left background processes running that I had to hunt
> down and kill manually.
>
> I then started looking a why this keeps happening, and realised that
> the way we clean up on test completion is messy, inconsistent and
> frequently buggy. So I started cleaning it all up, starting with the
> tests/xfs directory because I saw a lot of low hanging fruit there.
>
> Essentially, we use _cleanup() functions as a way of overriding the
> default trap handler we install in _begin_fstest(). Rather than
> register a new handler, we just redefine the common cleanup function
> and re-implement it (poorly) in every test that does an override.
> Often these overrides are completely unnecessary - I think I reduced
> the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> and I reudced the number of *unique overrides by a lot more than
> that.
>

That looks like an awesome improvement!

> The method for overriding changes to be "stacked cleanups" rather
> than "duplicated cleanups". That is, tests no longer open code:
>
>         cd /
>         rm -rf $tmp.*
>
> THis is what common/preamble::_cleanup() does. We should call that
> function to do this. Hence if we have a local cleanup that we need
> to do, it becomes:
>
> local_cleanup()
> {
>         rm -f $testfile
>         _cleanup
> }
> _register_cleanup local_cleanup

While removing boilerplate code, we had better not create another boilerplate.
Instead of expecting test writers to always call _cleanup
if we always want _cleanup to be called we can always implicitly
chain it in _register_cleanup():

--- a/common/preamble
+++ b/common/preamble
@@ -20,7 +20,7 @@ _register_cleanup()
        shift

        test -n "$cleanup" && cleanup="${cleanup}; "
-       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
+       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
 }

 # Prepare to run a fstest by initializing the required global variables to
@@ -46,7 +46,7 @@ _begin_fstest()
        tmp=/tmp/$$
        status=1        # failure is the default!

-       _register_cleanup _cleanup
+       _register_cleanup

        . ./common/rc


It could be a bit weird for tests that call
_register_cleanup _cleanup

But you removed most of those in the SIGBUS patch
and even if they do exists, _cleanup() is re-entrant.


>
> While this looks more verbose, it means we can actually reuse the
> same cleanup function across lots of tests.
>
> A large number of xfsdump tests were all using the same override
> cleanup function to call _cleanup_dump. These are all changed to:
>
> . ./common/dump
> _register_cleanup _cleanup_dump
>
> and _cleanup_dump stacks like this:
>
> _cleanup_dump()
> {
>         #do xfsdump cleanup stuff
>
>         _cleanup
> }
>
> and we don't need to do anything else. There is one xfsdump test
> that needs it's own cleanup. It stacks like this:
>
> local_cleanup()
> {
>         rm -f $testfile
>         _cleanup_dump
> }
> _register_cleanup local_cleanup
>
> All the tests that run fsstress in the background now have a common
> cleanup function that kills fsstress processes defined in
> common/preamble. They just do:
>
> _register_cleanup _cleanup_fsstress
>
> And now every test that puts fsstress in the background behaves
> correctly and kills all the background fsstree processes when
> interrupted.
>
> The conversion is by no means complete. I've named the local cleanup
> functions by what they do so we can go back and derive commonality
> between them. The number of different variations on tearing down
> loops devices is crazy, and half of them are buggy. I haven't worked
> through these yet, so you'll see lots of tests with:
>
> _loop_cleanup()
> {
>         ......
>         _cleanup
> }
> _register_cleanup _loop_cleanup
>
> That have similar but different ways of cleaning up loop devices.
>
> I also added a _no_cleanup() function, as there are a large number
> of xfs fuzzer tests that want to leave a warm corpse behind so that
> debugging what just happened is easy.
>
> I also added BUS to the default signal trap set - well over a 100
> tests in tests/xfs had a line like:
>
> _register_cleanup "_cleanup" BUS
>
> just to add BUS signals to the set that would cause the cleanup
> function to run. Just make it the default!
>
> Overall, this significantly reduces the amount of boiler plate in
> tests, and sets us down the path of common cleanup functions that
> tests may not even need to define. e.g. just including
> ./common/dmflakey registers the _cleanup_dmflakey() trap that will
> do all the necessary cleanup when the test exists. This makes the
> tests simpler, more robust and reduces the maintenance burden of
> over 1700 individual tests....
>
> I won't put the full diffstat in this mail, but the benefits should
> be clean from the summary:
>
> 360 files changed, 420 insertions(+), 1781 deletions(-)
>
> I've lost count of the number of test bugs I killed in removing
> all this code, and that's largely just in the tests/xfs directory.
> So before I go spend another couple of days on converting the rest
> of fstests, I figured I better make sure everyone is OK with these
> changes.
>
> Thoughts, comments?
>

Thank you for taking this on!

Amir.
Dave Chinner May 24, 2022, 9:57 a.m. UTC | #2
On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > Hi folks,
> >
> > I pulled on a string a couple of days ago, and it got out of
> > control. It all started when I went to kill a test with ctrl-c and
> > it, once again, left background processes running that I had to hunt
> > down and kill manually.
> >
> > I then started looking a why this keeps happening, and realised that
> > the way we clean up on test completion is messy, inconsistent and
> > frequently buggy. So I started cleaning it all up, starting with the
> > tests/xfs directory because I saw a lot of low hanging fruit there.
> >
> > Essentially, we use _cleanup() functions as a way of overriding the
> > default trap handler we install in _begin_fstest(). Rather than
> > register a new handler, we just redefine the common cleanup function
> > and re-implement it (poorly) in every test that does an override.
> > Often these overrides are completely unnecessary - I think I reduced
> > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > and I reudced the number of *unique overrides by a lot more than
> > that.
> >
> 
> That looks like an awesome improvement!
> 
> > The method for overriding changes to be "stacked cleanups" rather
> > than "duplicated cleanups". That is, tests no longer open code:
> >
> >         cd /
> >         rm -rf $tmp.*
> >
> > THis is what common/preamble::_cleanup() does. We should call that
> > function to do this. Hence if we have a local cleanup that we need
> > to do, it becomes:
> >
> > local_cleanup()
> > {
> >         rm -f $testfile
> >         _cleanup
> > }
> > _register_cleanup local_cleanup
> 
> While removing boilerplate code, we had better not create another boilerplate.
> Instead of expecting test writers to always call _cleanup
> if we always want _cleanup to be called we can always implicitly
> chain it in _register_cleanup():
> 
> --- a/common/preamble
> +++ b/common/preamble
> @@ -20,7 +20,7 @@ _register_cleanup()
>         shift
> 
>         test -n "$cleanup" && cleanup="${cleanup}; "
> -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
>  }

I considered that, but then I found the _no_cleanup cases. IOWs,
this doesn't work for the cases where we want to prevent the generic
_cleanup function from being run on failure/test exit. Hence the
cleanup function stacking behaviour rather than unconditional
calling of _cleanup as per above.

Cheers,

Dave.
Amir Goldstein May 24, 2022, 10:01 a.m. UTC | #3
On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > Hi folks,
> > >
> > > I pulled on a string a couple of days ago, and it got out of
> > > control. It all started when I went to kill a test with ctrl-c and
> > > it, once again, left background processes running that I had to hunt
> > > down and kill manually.
> > >
> > > I then started looking a why this keeps happening, and realised that
> > > the way we clean up on test completion is messy, inconsistent and
> > > frequently buggy. So I started cleaning it all up, starting with the
> > > tests/xfs directory because I saw a lot of low hanging fruit there.
> > >
> > > Essentially, we use _cleanup() functions as a way of overriding the
> > > default trap handler we install in _begin_fstest(). Rather than
> > > register a new handler, we just redefine the common cleanup function
> > > and re-implement it (poorly) in every test that does an override.
> > > Often these overrides are completely unnecessary - I think I reduced
> > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > > and I reudced the number of *unique overrides by a lot more than
> > > that.
> > >
> >
> > That looks like an awesome improvement!
> >
> > > The method for overriding changes to be "stacked cleanups" rather
> > > than "duplicated cleanups". That is, tests no longer open code:
> > >
> > >         cd /
> > >         rm -rf $tmp.*
> > >
> > > THis is what common/preamble::_cleanup() does. We should call that
> > > function to do this. Hence if we have a local cleanup that we need
> > > to do, it becomes:
> > >
> > > local_cleanup()
> > > {
> > >         rm -f $testfile
> > >         _cleanup
> > > }
> > > _register_cleanup local_cleanup
> >
> > While removing boilerplate code, we had better not create another boilerplate.
> > Instead of expecting test writers to always call _cleanup
> > if we always want _cleanup to be called we can always implicitly
> > chain it in _register_cleanup():
> >
> > --- a/common/preamble
> > +++ b/common/preamble
> > @@ -20,7 +20,7 @@ _register_cleanup()
> >         shift
> >
> >         test -n "$cleanup" && cleanup="${cleanup}; "
> > -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> > +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
> >  }
>
> I considered that, but then I found the _no_cleanup cases. IOWs,
> this doesn't work for the cases where we want to prevent the generic
> _cleanup function from being run on failure/test exit. Hence the
> cleanup function stacking behaviour rather than unconditional
> calling of _cleanup as per above.
>

I didn't know about those.
Since you went to all this trouble to find them, can you provide a reference.
I wonder, what could ever be the reason not to want to rm $tmp.*?

Thanks,
Amir.
Dave Chinner May 24, 2022, 10:13 a.m. UTC | #4
On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote:
> On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > I pulled on a string a couple of days ago, and it got out of
> > > > control. It all started when I went to kill a test with ctrl-c and
> > > > it, once again, left background processes running that I had to hunt
> > > > down and kill manually.
> > > >
> > > > I then started looking a why this keeps happening, and realised that
> > > > the way we clean up on test completion is messy, inconsistent and
> > > > frequently buggy. So I started cleaning it all up, starting with the
> > > > tests/xfs directory because I saw a lot of low hanging fruit there.
> > > >
> > > > Essentially, we use _cleanup() functions as a way of overriding the
> > > > default trap handler we install in _begin_fstest(). Rather than
> > > > register a new handler, we just redefine the common cleanup function
> > > > and re-implement it (poorly) in every test that does an override.
> > > > Often these overrides are completely unnecessary - I think I reduced
> > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > > > and I reudced the number of *unique overrides by a lot more than
> > > > that.
> > > >
> > >
> > > That looks like an awesome improvement!
> > >
> > > > The method for overriding changes to be "stacked cleanups" rather
> > > > than "duplicated cleanups". That is, tests no longer open code:
> > > >
> > > >         cd /
> > > >         rm -rf $tmp.*
> > > >
> > > > THis is what common/preamble::_cleanup() does. We should call that
> > > > function to do this. Hence if we have a local cleanup that we need
> > > > to do, it becomes:
> > > >
> > > > local_cleanup()
> > > > {
> > > >         rm -f $testfile
> > > >         _cleanup
> > > > }
> > > > _register_cleanup local_cleanup
> > >
> > > While removing boilerplate code, we had better not create another boilerplate.
> > > Instead of expecting test writers to always call _cleanup
> > > if we always want _cleanup to be called we can always implicitly
> > > chain it in _register_cleanup():
> > >
> > > --- a/common/preamble
> > > +++ b/common/preamble
> > > @@ -20,7 +20,7 @@ _register_cleanup()
> > >         shift
> > >
> > >         test -n "$cleanup" && cleanup="${cleanup}; "
> > > -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> > > +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
> > >  }
> >
> > I considered that, but then I found the _no_cleanup cases. IOWs,
> > this doesn't work for the cases where we want to prevent the generic
> > _cleanup function from being run on failure/test exit. Hence the
> > cleanup function stacking behaviour rather than unconditional
> > calling of _cleanup as per above.
> >
> 
> I didn't know about those.
> Since you went to all this trouble to find them, can you provide a reference.
> I wonder, what could ever be the reason not to want to rm $tmp.*?

[PATCH 6/8] fstests: consolidate no cleanup test setup

Cheers,

Dave.
Amir Goldstein May 24, 2022, 12:14 p.m. UTC | #5
On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote:
> > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I pulled on a string a couple of days ago, and it got out of
> > > > > control. It all started when I went to kill a test with ctrl-c and
> > > > > it, once again, left background processes running that I had to hunt
> > > > > down and kill manually.
> > > > >
> > > > > I then started looking a why this keeps happening, and realised that
> > > > > the way we clean up on test completion is messy, inconsistent and
> > > > > frequently buggy. So I started cleaning it all up, starting with the
> > > > > tests/xfs directory because I saw a lot of low hanging fruit there.
> > > > >
> > > > > Essentially, we use _cleanup() functions as a way of overriding the
> > > > > default trap handler we install in _begin_fstest(). Rather than
> > > > > register a new handler, we just redefine the common cleanup function
> > > > > and re-implement it (poorly) in every test that does an override.
> > > > > Often these overrides are completely unnecessary - I think I reduced
> > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > > > > and I reudced the number of *unique overrides by a lot more than
> > > > > that.
> > > > >
> > > >
> > > > That looks like an awesome improvement!
> > > >
> > > > > The method for overriding changes to be "stacked cleanups" rather
> > > > > than "duplicated cleanups". That is, tests no longer open code:
> > > > >
> > > > >         cd /
> > > > >         rm -rf $tmp.*
> > > > >
> > > > > THis is what common/preamble::_cleanup() does. We should call that
> > > > > function to do this. Hence if we have a local cleanup that we need
> > > > > to do, it becomes:
> > > > >
> > > > > local_cleanup()
> > > > > {
> > > > >         rm -f $testfile
> > > > >         _cleanup
> > > > > }
> > > > > _register_cleanup local_cleanup
> > > >
> > > > While removing boilerplate code, we had better not create another boilerplate.
> > > > Instead of expecting test writers to always call _cleanup
> > > > if we always want _cleanup to be called we can always implicitly
> > > > chain it in _register_cleanup():
> > > >
> > > > --- a/common/preamble
> > > > +++ b/common/preamble
> > > > @@ -20,7 +20,7 @@ _register_cleanup()
> > > >         shift
> > > >
> > > >         test -n "$cleanup" && cleanup="${cleanup}; "
> > > > -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> > > > +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
> > > >  }
> > >
> > > I considered that, but then I found the _no_cleanup cases. IOWs,
> > > this doesn't work for the cases where we want to prevent the generic
> > > _cleanup function from being run on failure/test exit. Hence the
> > > cleanup function stacking behaviour rather than unconditional
> > > calling of _cleanup as per above.
> > >
> >
> > I didn't know about those.
> > Since you went to all this trouble to find them, can you provide a reference.
> > I wonder, what could ever be the reason not to want to rm $tmp.*?
>
> [PATCH 6/8] fstests: consolidate no cleanup test setup
>

Ah I see.
It might have been better to explicitly opt-out of cleanup
only for those tests via _register_no_cleanup or _unregister_cleanup

similar to the common _require_scratch and the exceptions of
_require_scratch_nocheck

But that could be done by followup cleanup and someone has
an itch to scratch ;)

Thanks,
Amir.
Dave Chinner May 24, 2022, 12:28 p.m. UTC | #6
On Tue, May 24, 2022 at 03:14:39PM +0300, Amir Goldstein wrote:
> On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote:
> > > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> > > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > I pulled on a string a couple of days ago, and it got out of
> > > > > > control. It all started when I went to kill a test with ctrl-c and
> > > > > > it, once again, left background processes running that I had to hunt
> > > > > > down and kill manually.
> > > > > >
> > > > > > I then started looking a why this keeps happening, and realised that
> > > > > > the way we clean up on test completion is messy, inconsistent and
> > > > > > frequently buggy. So I started cleaning it all up, starting with the
> > > > > > tests/xfs directory because I saw a lot of low hanging fruit there.
> > > > > >
> > > > > > Essentially, we use _cleanup() functions as a way of overriding the
> > > > > > default trap handler we install in _begin_fstest(). Rather than
> > > > > > register a new handler, we just redefine the common cleanup function
> > > > > > and re-implement it (poorly) in every test that does an override.
> > > > > > Often these overrides are completely unnecessary - I think I reduced
> > > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > > > > > and I reudced the number of *unique overrides by a lot more than
> > > > > > that.
> > > > > >
> > > > >
> > > > > That looks like an awesome improvement!
> > > > >
> > > > > > The method for overriding changes to be "stacked cleanups" rather
> > > > > > than "duplicated cleanups". That is, tests no longer open code:
> > > > > >
> > > > > >         cd /
> > > > > >         rm -rf $tmp.*
> > > > > >
> > > > > > THis is what common/preamble::_cleanup() does. We should call that
> > > > > > function to do this. Hence if we have a local cleanup that we need
> > > > > > to do, it becomes:
> > > > > >
> > > > > > local_cleanup()
> > > > > > {
> > > > > >         rm -f $testfile
> > > > > >         _cleanup
> > > > > > }
> > > > > > _register_cleanup local_cleanup
> > > > >
> > > > > While removing boilerplate code, we had better not create another boilerplate.
> > > > > Instead of expecting test writers to always call _cleanup
> > > > > if we always want _cleanup to be called we can always implicitly
> > > > > chain it in _register_cleanup():
> > > > >
> > > > > --- a/common/preamble
> > > > > +++ b/common/preamble
> > > > > @@ -20,7 +20,7 @@ _register_cleanup()
> > > > >         shift
> > > > >
> > > > >         test -n "$cleanup" && cleanup="${cleanup}; "
> > > > > -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> > > > > +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
> > > > >  }
> > > >
> > > > I considered that, but then I found the _no_cleanup cases. IOWs,
> > > > this doesn't work for the cases where we want to prevent the generic
> > > > _cleanup function from being run on failure/test exit. Hence the
> > > > cleanup function stacking behaviour rather than unconditional
> > > > calling of _cleanup as per above.
> > > >
> > >
> > > I didn't know about those.
> > > Since you went to all this trouble to find them, can you provide a reference.
> > > I wonder, what could ever be the reason not to want to rm $tmp.*?
> >
> > [PATCH 6/8] fstests: consolidate no cleanup test setup
> >
> 
> Ah I see.
> It might have been better to explicitly opt-out of cleanup
> only for those tests via _register_no_cleanup or _unregister_cleanup

Premature optimisation. 

-Dave.
Amir Goldstein May 24, 2022, 12:34 p.m. UTC | #7
On Tue, May 24, 2022 at 3:28 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, May 24, 2022 at 03:14:39PM +0300, Amir Goldstein wrote:
> > On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote:
> > > > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote:
> > > > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > > >
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > I pulled on a string a couple of days ago, and it got out of
> > > > > > > control. It all started when I went to kill a test with ctrl-c and
> > > > > > > it, once again, left background processes running that I had to hunt
> > > > > > > down and kill manually.
> > > > > > >
> > > > > > > I then started looking a why this keeps happening, and realised that
> > > > > > > the way we clean up on test completion is messy, inconsistent and
> > > > > > > frequently buggy. So I started cleaning it all up, starting with the
> > > > > > > tests/xfs directory because I saw a lot of low hanging fruit there.
> > > > > > >
> > > > > > > Essentially, we use _cleanup() functions as a way of overriding the
> > > > > > > default trap handler we install in _begin_fstest(). Rather than
> > > > > > > register a new handler, we just redefine the common cleanup function
> > > > > > > and re-implement it (poorly) in every test that does an override.
> > > > > > > Often these overrides are completely unnecessary - I think I reduced
> > > > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125),
> > > > > > > and I reudced the number of *unique overrides by a lot more than
> > > > > > > that.
> > > > > > >
> > > > > >
> > > > > > That looks like an awesome improvement!
> > > > > >
> > > > > > > The method for overriding changes to be "stacked cleanups" rather
> > > > > > > than "duplicated cleanups". That is, tests no longer open code:
> > > > > > >
> > > > > > >         cd /
> > > > > > >         rm -rf $tmp.*
> > > > > > >
> > > > > > > THis is what common/preamble::_cleanup() does. We should call that
> > > > > > > function to do this. Hence if we have a local cleanup that we need
> > > > > > > to do, it becomes:
> > > > > > >
> > > > > > > local_cleanup()
> > > > > > > {
> > > > > > >         rm -f $testfile
> > > > > > >         _cleanup
> > > > > > > }
> > > > > > > _register_cleanup local_cleanup
> > > > > >
> > > > > > While removing boilerplate code, we had better not create another boilerplate.
> > > > > > Instead of expecting test writers to always call _cleanup
> > > > > > if we always want _cleanup to be called we can always implicitly
> > > > > > chain it in _register_cleanup():
> > > > > >
> > > > > > --- a/common/preamble
> > > > > > +++ b/common/preamble
> > > > > > @@ -20,7 +20,7 @@ _register_cleanup()
> > > > > >         shift
> > > > > >
> > > > > >         test -n "$cleanup" && cleanup="${cleanup}; "
> > > > > > -       trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $*
> > > > > > +       trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $*
> > > > > >  }
> > > > >
> > > > > I considered that, but then I found the _no_cleanup cases. IOWs,
> > > > > this doesn't work for the cases where we want to prevent the generic
> > > > > _cleanup function from being run on failure/test exit. Hence the
> > > > > cleanup function stacking behaviour rather than unconditional
> > > > > calling of _cleanup as per above.
> > > > >
> > > >
> > > > I didn't know about those.
> > > > Since you went to all this trouble to find them, can you provide a reference.
> > > > I wonder, what could ever be the reason not to want to rm $tmp.*?
> > >
> > > [PATCH 6/8] fstests: consolidate no cleanup test setup
> > >
> >
> > Ah I see.
> > It might have been better to explicitly opt-out of cleanup
> > only for those tests via _register_no_cleanup or _unregister_cleanup
>
> Premature optimisation.
>

It's not an optimization at all.

The purpose of doing it this way would be to eliminate the human mistakes
(from which you found so many in this work) and create a situation where
_cleanup() is called for all tests, unless the test author really
wants to avoid it.

But anyway, no need to fix everything at one go.
There will still be plenty of ways that a test author can write a
test that does not clean after itself.

Thanks,
Amir.