Message ID | 20220524073411.1943480-1-david@fromorbit.com (mailing list archive) |
---|---|
Headers | show |
Series | fstests: _cleanup() overrides are a mess | expand |
On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > Hi folks, > > I pulled on a string a couple of days ago, and it got out of > control. It all started when I went to kill a test with ctrl-c and > it, once again, left background processes running that I had to hunt > down and kill manually. > > I then started looking a why this keeps happening, and realised that > the way we clean up on test completion is messy, inconsistent and > frequently buggy. So I started cleaning it all up, starting with the > tests/xfs directory because I saw a lot of low hanging fruit there. > > Essentially, we use _cleanup() functions as a way of overriding the > default trap handler we install in _begin_fstest(). Rather than > register a new handler, we just redefine the common cleanup function > and re-implement it (poorly) in every test that does an override. > Often these overrides are completely unnecessary - I think I reduced > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > and I reudced the number of *unique overrides by a lot more than > that. > That looks like an awesome improvement! > The method for overriding changes to be "stacked cleanups" rather > than "duplicated cleanups". That is, tests no longer open code: > > cd / > rm -rf $tmp.* > > THis is what common/preamble::_cleanup() does. We should call that > function to do this. Hence if we have a local cleanup that we need > to do, it becomes: > > local_cleanup() > { > rm -f $testfile > _cleanup > } > _register_cleanup local_cleanup While removing boilerplate code, we had better not create another boilerplate. Instead of expecting test writers to always call _cleanup if we always want _cleanup to be called we can always implicitly chain it in _register_cleanup(): --- a/common/preamble +++ b/common/preamble @@ -20,7 +20,7 @@ _register_cleanup() shift test -n "$cleanup" && cleanup="${cleanup}; " - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* } # Prepare to run a fstest by initializing the required global variables to @@ -46,7 +46,7 @@ _begin_fstest() tmp=/tmp/$$ status=1 # failure is the default! - _register_cleanup _cleanup + _register_cleanup . ./common/rc It could be a bit weird for tests that call _register_cleanup _cleanup But you removed most of those in the SIGBUS patch and even if they do exists, _cleanup() is re-entrant. > > While this looks more verbose, it means we can actually reuse the > same cleanup function across lots of tests. > > A large number of xfsdump tests were all using the same override > cleanup function to call _cleanup_dump. These are all changed to: > > . ./common/dump > _register_cleanup _cleanup_dump > > and _cleanup_dump stacks like this: > > _cleanup_dump() > { > #do xfsdump cleanup stuff > > _cleanup > } > > and we don't need to do anything else. There is one xfsdump test > that needs it's own cleanup. It stacks like this: > > local_cleanup() > { > rm -f $testfile > _cleanup_dump > } > _register_cleanup local_cleanup > > All the tests that run fsstress in the background now have a common > cleanup function that kills fsstress processes defined in > common/preamble. They just do: > > _register_cleanup _cleanup_fsstress > > And now every test that puts fsstress in the background behaves > correctly and kills all the background fsstree processes when > interrupted. > > The conversion is by no means complete. I've named the local cleanup > functions by what they do so we can go back and derive commonality > between them. The number of different variations on tearing down > loops devices is crazy, and half of them are buggy. I haven't worked > through these yet, so you'll see lots of tests with: > > _loop_cleanup() > { > ...... > _cleanup > } > _register_cleanup _loop_cleanup > > That have similar but different ways of cleaning up loop devices. > > I also added a _no_cleanup() function, as there are a large number > of xfs fuzzer tests that want to leave a warm corpse behind so that > debugging what just happened is easy. > > I also added BUS to the default signal trap set - well over a 100 > tests in tests/xfs had a line like: > > _register_cleanup "_cleanup" BUS > > just to add BUS signals to the set that would cause the cleanup > function to run. Just make it the default! > > Overall, this significantly reduces the amount of boiler plate in > tests, and sets us down the path of common cleanup functions that > tests may not even need to define. e.g. just including > ./common/dmflakey registers the _cleanup_dmflakey() trap that will > do all the necessary cleanup when the test exists. This makes the > tests simpler, more robust and reduces the maintenance burden of > over 1700 individual tests.... > > I won't put the full diffstat in this mail, but the benefits should > be clean from the summary: > > 360 files changed, 420 insertions(+), 1781 deletions(-) > > I've lost count of the number of test bugs I killed in removing > all this code, and that's largely just in the tests/xfs directory. > So before I go spend another couple of days on converting the rest > of fstests, I figured I better make sure everyone is OK with these > changes. > > Thoughts, comments? > Thank you for taking this on! Amir.
On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > Hi folks, > > > > I pulled on a string a couple of days ago, and it got out of > > control. It all started when I went to kill a test with ctrl-c and > > it, once again, left background processes running that I had to hunt > > down and kill manually. > > > > I then started looking a why this keeps happening, and realised that > > the way we clean up on test completion is messy, inconsistent and > > frequently buggy. So I started cleaning it all up, starting with the > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > Essentially, we use _cleanup() functions as a way of overriding the > > default trap handler we install in _begin_fstest(). Rather than > > register a new handler, we just redefine the common cleanup function > > and re-implement it (poorly) in every test that does an override. > > Often these overrides are completely unnecessary - I think I reduced > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > and I reudced the number of *unique overrides by a lot more than > > that. > > > > That looks like an awesome improvement! > > > The method for overriding changes to be "stacked cleanups" rather > > than "duplicated cleanups". That is, tests no longer open code: > > > > cd / > > rm -rf $tmp.* > > > > THis is what common/preamble::_cleanup() does. We should call that > > function to do this. Hence if we have a local cleanup that we need > > to do, it becomes: > > > > local_cleanup() > > { > > rm -f $testfile > > _cleanup > > } > > _register_cleanup local_cleanup > > While removing boilerplate code, we had better not create another boilerplate. > Instead of expecting test writers to always call _cleanup > if we always want _cleanup to be called we can always implicitly > chain it in _register_cleanup(): > > --- a/common/preamble > +++ b/common/preamble > @@ -20,7 +20,7 @@ _register_cleanup() > shift > > test -n "$cleanup" && cleanup="${cleanup}; " > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > } I considered that, but then I found the _no_cleanup cases. IOWs, this doesn't work for the cases where we want to prevent the generic _cleanup function from being run on failure/test exit. Hence the cleanup function stacking behaviour rather than unconditional calling of _cleanup as per above. Cheers, Dave.
On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > Hi folks, > > > > > > I pulled on a string a couple of days ago, and it got out of > > > control. It all started when I went to kill a test with ctrl-c and > > > it, once again, left background processes running that I had to hunt > > > down and kill manually. > > > > > > I then started looking a why this keeps happening, and realised that > > > the way we clean up on test completion is messy, inconsistent and > > > frequently buggy. So I started cleaning it all up, starting with the > > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > > > Essentially, we use _cleanup() functions as a way of overriding the > > > default trap handler we install in _begin_fstest(). Rather than > > > register a new handler, we just redefine the common cleanup function > > > and re-implement it (poorly) in every test that does an override. > > > Often these overrides are completely unnecessary - I think I reduced > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > > and I reudced the number of *unique overrides by a lot more than > > > that. > > > > > > > That looks like an awesome improvement! > > > > > The method for overriding changes to be "stacked cleanups" rather > > > than "duplicated cleanups". That is, tests no longer open code: > > > > > > cd / > > > rm -rf $tmp.* > > > > > > THis is what common/preamble::_cleanup() does. We should call that > > > function to do this. Hence if we have a local cleanup that we need > > > to do, it becomes: > > > > > > local_cleanup() > > > { > > > rm -f $testfile > > > _cleanup > > > } > > > _register_cleanup local_cleanup > > > > While removing boilerplate code, we had better not create another boilerplate. > > Instead of expecting test writers to always call _cleanup > > if we always want _cleanup to be called we can always implicitly > > chain it in _register_cleanup(): > > > > --- a/common/preamble > > +++ b/common/preamble > > @@ -20,7 +20,7 @@ _register_cleanup() > > shift > > > > test -n "$cleanup" && cleanup="${cleanup}; " > > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > > } > > I considered that, but then I found the _no_cleanup cases. IOWs, > this doesn't work for the cases where we want to prevent the generic > _cleanup function from being run on failure/test exit. Hence the > cleanup function stacking behaviour rather than unconditional > calling of _cleanup as per above. > I didn't know about those. Since you went to all this trouble to find them, can you provide a reference. I wonder, what could ever be the reason not to want to rm $tmp.*? Thanks, Amir.
On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote: > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > Hi folks, > > > > > > > > I pulled on a string a couple of days ago, and it got out of > > > > control. It all started when I went to kill a test with ctrl-c and > > > > it, once again, left background processes running that I had to hunt > > > > down and kill manually. > > > > > > > > I then started looking a why this keeps happening, and realised that > > > > the way we clean up on test completion is messy, inconsistent and > > > > frequently buggy. So I started cleaning it all up, starting with the > > > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > > > > > Essentially, we use _cleanup() functions as a way of overriding the > > > > default trap handler we install in _begin_fstest(). Rather than > > > > register a new handler, we just redefine the common cleanup function > > > > and re-implement it (poorly) in every test that does an override. > > > > Often these overrides are completely unnecessary - I think I reduced > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > > > and I reudced the number of *unique overrides by a lot more than > > > > that. > > > > > > > > > > That looks like an awesome improvement! > > > > > > > The method for overriding changes to be "stacked cleanups" rather > > > > than "duplicated cleanups". That is, tests no longer open code: > > > > > > > > cd / > > > > rm -rf $tmp.* > > > > > > > > THis is what common/preamble::_cleanup() does. We should call that > > > > function to do this. Hence if we have a local cleanup that we need > > > > to do, it becomes: > > > > > > > > local_cleanup() > > > > { > > > > rm -f $testfile > > > > _cleanup > > > > } > > > > _register_cleanup local_cleanup > > > > > > While removing boilerplate code, we had better not create another boilerplate. > > > Instead of expecting test writers to always call _cleanup > > > if we always want _cleanup to be called we can always implicitly > > > chain it in _register_cleanup(): > > > > > > --- a/common/preamble > > > +++ b/common/preamble > > > @@ -20,7 +20,7 @@ _register_cleanup() > > > shift > > > > > > test -n "$cleanup" && cleanup="${cleanup}; " > > > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > > > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > > > } > > > > I considered that, but then I found the _no_cleanup cases. IOWs, > > this doesn't work for the cases where we want to prevent the generic > > _cleanup function from being run on failure/test exit. Hence the > > cleanup function stacking behaviour rather than unconditional > > calling of _cleanup as per above. > > > > I didn't know about those. > Since you went to all this trouble to find them, can you provide a reference. > I wonder, what could ever be the reason not to want to rm $tmp.*? [PATCH 6/8] fstests: consolidate no cleanup test setup Cheers, Dave.
On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote: > > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > Hi folks, > > > > > > > > > > I pulled on a string a couple of days ago, and it got out of > > > > > control. It all started when I went to kill a test with ctrl-c and > > > > > it, once again, left background processes running that I had to hunt > > > > > down and kill manually. > > > > > > > > > > I then started looking a why this keeps happening, and realised that > > > > > the way we clean up on test completion is messy, inconsistent and > > > > > frequently buggy. So I started cleaning it all up, starting with the > > > > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > > > > > > > Essentially, we use _cleanup() functions as a way of overriding the > > > > > default trap handler we install in _begin_fstest(). Rather than > > > > > register a new handler, we just redefine the common cleanup function > > > > > and re-implement it (poorly) in every test that does an override. > > > > > Often these overrides are completely unnecessary - I think I reduced > > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > > > > and I reudced the number of *unique overrides by a lot more than > > > > > that. > > > > > > > > > > > > > That looks like an awesome improvement! > > > > > > > > > The method for overriding changes to be "stacked cleanups" rather > > > > > than "duplicated cleanups". That is, tests no longer open code: > > > > > > > > > > cd / > > > > > rm -rf $tmp.* > > > > > > > > > > THis is what common/preamble::_cleanup() does. We should call that > > > > > function to do this. Hence if we have a local cleanup that we need > > > > > to do, it becomes: > > > > > > > > > > local_cleanup() > > > > > { > > > > > rm -f $testfile > > > > > _cleanup > > > > > } > > > > > _register_cleanup local_cleanup > > > > > > > > While removing boilerplate code, we had better not create another boilerplate. > > > > Instead of expecting test writers to always call _cleanup > > > > if we always want _cleanup to be called we can always implicitly > > > > chain it in _register_cleanup(): > > > > > > > > --- a/common/preamble > > > > +++ b/common/preamble > > > > @@ -20,7 +20,7 @@ _register_cleanup() > > > > shift > > > > > > > > test -n "$cleanup" && cleanup="${cleanup}; " > > > > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > > > > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > > > > } > > > > > > I considered that, but then I found the _no_cleanup cases. IOWs, > > > this doesn't work for the cases where we want to prevent the generic > > > _cleanup function from being run on failure/test exit. Hence the > > > cleanup function stacking behaviour rather than unconditional > > > calling of _cleanup as per above. > > > > > > > I didn't know about those. > > Since you went to all this trouble to find them, can you provide a reference. > > I wonder, what could ever be the reason not to want to rm $tmp.*? > > [PATCH 6/8] fstests: consolidate no cleanup test setup > Ah I see. It might have been better to explicitly opt-out of cleanup only for those tests via _register_no_cleanup or _unregister_cleanup similar to the common _require_scratch and the exceptions of _require_scratch_nocheck But that could be done by followup cleanup and someone has an itch to scratch ;) Thanks, Amir.
On Tue, May 24, 2022 at 03:14:39PM +0300, Amir Goldstein wrote: > On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote: > > > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > > > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > I pulled on a string a couple of days ago, and it got out of > > > > > > control. It all started when I went to kill a test with ctrl-c and > > > > > > it, once again, left background processes running that I had to hunt > > > > > > down and kill manually. > > > > > > > > > > > > I then started looking a why this keeps happening, and realised that > > > > > > the way we clean up on test completion is messy, inconsistent and > > > > > > frequently buggy. So I started cleaning it all up, starting with the > > > > > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > > > > > > > > > Essentially, we use _cleanup() functions as a way of overriding the > > > > > > default trap handler we install in _begin_fstest(). Rather than > > > > > > register a new handler, we just redefine the common cleanup function > > > > > > and re-implement it (poorly) in every test that does an override. > > > > > > Often these overrides are completely unnecessary - I think I reduced > > > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > > > > > and I reudced the number of *unique overrides by a lot more than > > > > > > that. > > > > > > > > > > > > > > > > That looks like an awesome improvement! > > > > > > > > > > > The method for overriding changes to be "stacked cleanups" rather > > > > > > than "duplicated cleanups". That is, tests no longer open code: > > > > > > > > > > > > cd / > > > > > > rm -rf $tmp.* > > > > > > > > > > > > THis is what common/preamble::_cleanup() does. We should call that > > > > > > function to do this. Hence if we have a local cleanup that we need > > > > > > to do, it becomes: > > > > > > > > > > > > local_cleanup() > > > > > > { > > > > > > rm -f $testfile > > > > > > _cleanup > > > > > > } > > > > > > _register_cleanup local_cleanup > > > > > > > > > > While removing boilerplate code, we had better not create another boilerplate. > > > > > Instead of expecting test writers to always call _cleanup > > > > > if we always want _cleanup to be called we can always implicitly > > > > > chain it in _register_cleanup(): > > > > > > > > > > --- a/common/preamble > > > > > +++ b/common/preamble > > > > > @@ -20,7 +20,7 @@ _register_cleanup() > > > > > shift > > > > > > > > > > test -n "$cleanup" && cleanup="${cleanup}; " > > > > > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > > > > > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > > > > > } > > > > > > > > I considered that, but then I found the _no_cleanup cases. IOWs, > > > > this doesn't work for the cases where we want to prevent the generic > > > > _cleanup function from being run on failure/test exit. Hence the > > > > cleanup function stacking behaviour rather than unconditional > > > > calling of _cleanup as per above. > > > > > > > > > > I didn't know about those. > > > Since you went to all this trouble to find them, can you provide a reference. > > > I wonder, what could ever be the reason not to want to rm $tmp.*? > > > > [PATCH 6/8] fstests: consolidate no cleanup test setup > > > > Ah I see. > It might have been better to explicitly opt-out of cleanup > only for those tests via _register_no_cleanup or _unregister_cleanup Premature optimisation. -Dave.
On Tue, May 24, 2022 at 3:28 PM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, May 24, 2022 at 03:14:39PM +0300, Amir Goldstein wrote: > > On Tue, May 24, 2022 at 1:13 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Tue, May 24, 2022 at 01:01:57PM +0300, Amir Goldstein wrote: > > > > On Tue, May 24, 2022 at 12:57 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > On Tue, May 24, 2022 at 11:29:17AM +0300, Amir Goldstein wrote: > > > > > > On Tue, May 24, 2022 at 11:01 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > > > I pulled on a string a couple of days ago, and it got out of > > > > > > > control. It all started when I went to kill a test with ctrl-c and > > > > > > > it, once again, left background processes running that I had to hunt > > > > > > > down and kill manually. > > > > > > > > > > > > > > I then started looking a why this keeps happening, and realised that > > > > > > > the way we clean up on test completion is messy, inconsistent and > > > > > > > frequently buggy. So I started cleaning it all up, starting with the > > > > > > > tests/xfs directory because I saw a lot of low hanging fruit there. > > > > > > > > > > > > > > Essentially, we use _cleanup() functions as a way of overriding the > > > > > > > default trap handler we install in _begin_fstest(). Rather than > > > > > > > register a new handler, we just redefine the common cleanup function > > > > > > > and re-implement it (poorly) in every test that does an override. > > > > > > > Often these overrides are completely unnecessary - I think I reduced > > > > > > > the total number of overrides in tests/xfs by ~30% (~190 -> ~125), > > > > > > > and I reudced the number of *unique overrides by a lot more than > > > > > > > that. > > > > > > > > > > > > > > > > > > > That looks like an awesome improvement! > > > > > > > > > > > > > The method for overriding changes to be "stacked cleanups" rather > > > > > > > than "duplicated cleanups". That is, tests no longer open code: > > > > > > > > > > > > > > cd / > > > > > > > rm -rf $tmp.* > > > > > > > > > > > > > > THis is what common/preamble::_cleanup() does. We should call that > > > > > > > function to do this. Hence if we have a local cleanup that we need > > > > > > > to do, it becomes: > > > > > > > > > > > > > > local_cleanup() > > > > > > > { > > > > > > > rm -f $testfile > > > > > > > _cleanup > > > > > > > } > > > > > > > _register_cleanup local_cleanup > > > > > > > > > > > > While removing boilerplate code, we had better not create another boilerplate. > > > > > > Instead of expecting test writers to always call _cleanup > > > > > > if we always want _cleanup to be called we can always implicitly > > > > > > chain it in _register_cleanup(): > > > > > > > > > > > > --- a/common/preamble > > > > > > +++ b/common/preamble > > > > > > @@ -20,7 +20,7 @@ _register_cleanup() > > > > > > shift > > > > > > > > > > > > test -n "$cleanup" && cleanup="${cleanup}; " > > > > > > - trap "${cleanup}exit \$status" EXIT HUP INT QUIT TERM $* > > > > > > + trap "${cleanup}_cleanup; exit \$status" EXIT HUP INT QUIT TERM $* > > > > > > } > > > > > > > > > > I considered that, but then I found the _no_cleanup cases. IOWs, > > > > > this doesn't work for the cases where we want to prevent the generic > > > > > _cleanup function from being run on failure/test exit. Hence the > > > > > cleanup function stacking behaviour rather than unconditional > > > > > calling of _cleanup as per above. > > > > > > > > > > > > > I didn't know about those. > > > > Since you went to all this trouble to find them, can you provide a reference. > > > > I wonder, what could ever be the reason not to want to rm $tmp.*? > > > > > > [PATCH 6/8] fstests: consolidate no cleanup test setup > > > > > > > Ah I see. > > It might have been better to explicitly opt-out of cleanup > > only for those tests via _register_no_cleanup or _unregister_cleanup > > Premature optimisation. > It's not an optimization at all. The purpose of doing it this way would be to eliminate the human mistakes (from which you found so many in this work) and create a situation where _cleanup() is called for all tests, unless the test author really wants to avoid it. But anyway, no need to fix everything at one go. There will still be plenty of ways that a test author can write a test that does not clean after itself. Thanks, Amir.