diff mbox series

test-lib.sh: try to re-chmod & retry on failed trash removal

Message ID patch-1.1-d7e88a77fef-20211009T133043Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series test-lib.sh: try to re-chmod & retry on failed trash removal | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 9, 2021, 1:31 p.m. UTC
This fixes a regression in [1] where t0004-unwritable.sh was made to
use "test_when_finished" for cleanup, we wouldn't re-chmod the
".git/objects" on failure, leading to a persistent error when running
the test.

This can be demonstrated as e.g. (output snipped for less verbosity):

    $ ./t0004-unwritable.sh --run=3 --immediate
    ok 1 # skip setup (--run)
    ok 2 # skip write-tree should notice unwritable repository (--run)
    not ok 3 - commit should notice unwritable repository
    [...]
    $ ./t0004-unwritable.sh --run=3 --immediate
    rm: cannot remove '[...]/trash directory.t0004-unwritable/.git/objects/info': Permission denied
    FATAL: Cannot prepare test area
    [...]

I.e. if any of its tests failed, and the tests were being run under
"--debug"[2] or "--immediate"[3] (which was introduced after [1]) we
wouldn't re-chmod the object directory. We'd then fail on the next run
since the test setup couldn't remove the trash files.

Instead of some version of reverting [1] let's make the test-lib.sh
resilient to this edge-case, it will happen due to [1], but also
e.g. if the relevant "test-lib.sh" process is kill -9'd during the
test run. We should try harder to recover in this case. If we fail to
remove the test directory let's retry after (re-)chmod-ing it.

This doesn't need to be guarded by something that's equivalent to
"POSIXPERM" since if we don't support "chmod" we were about to fail
anyway. Let's also discard any error output from (a possibly
nonexisting) "chmod", we'll fail on the subsequent "rm -rf"
anyway.

The lack of &&-chaining between the "chmod" and "rm -rf" is
intentional, if we fail the first "rm -rf", can't chmod, but then
succeed the second time around that's what we were hoping for. We just
want to nuke the directory, not carry forward every possible error
code or error message.

1. dbda967684d (t0004 (unwritable files): simplify error handling,
   2010-09-06)
2. 5a4a088add3 (test-lib: do not remove trash_directory if called with
   --debug, 2008-08-21)
3. b586744a864 (test: skip clean-up when running under --immediate
   mode, 2011-06-27)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/test-lib.sh | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

SZEDER Gábor Oct. 10, 2021, 9:36 p.m. UTC | #1
On Sat, Oct 09, 2021 at 03:31:02PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This fixes a regression in [1] where t0004-unwritable.sh was made to
> use "test_when_finished" for cleanup, we wouldn't re-chmod the
> ".git/objects" on failure, leading to a persistent error when running
> the test.
> 
> This can be demonstrated as e.g. (output snipped for less verbosity):
> 
>     $ ./t0004-unwritable.sh --run=3 --immediate
>     ok 1 # skip setup (--run)
>     ok 2 # skip write-tree should notice unwritable repository (--run)
>     not ok 3 - commit should notice unwritable repository
>     [...]
>     $ ./t0004-unwritable.sh --run=3 --immediate
>     rm: cannot remove '[...]/trash directory.t0004-unwritable/.git/objects/info': Permission denied
>     FATAL: Cannot prepare test area
>     [...]
> 
> I.e. if any of its tests failed, and the tests were being run under
> "--debug"[2] or "--immediate"[3] (which was introduced after [1]) we

'--debug' runs the tests as usual, including any 'test_when_finished'
commands, so I don't see how it would be affected.  And indeed running
'./t0004-unwritable.sh --run=3 --debug' repeatedly doesn't have any
issues with removing the trash directory of the previous run.

> wouldn't re-chmod the object directory. We'd then fail on the next run
> since the test setup couldn't remove the trash files.
> 
> Instead of some version of reverting [1] let's make the test-lib.sh
> resilient to this edge-case, it will happen due to [1], but also
> e.g. if the relevant "test-lib.sh" process is kill -9'd during the
> test run. We should try harder to recover in this case. If we fail to
> remove the test directory let's retry after (re-)chmod-ing it.
> 
> This doesn't need to be guarded by something that's equivalent to
> "POSIXPERM" since if we don't support "chmod" we were about to fail
> anyway. Let's also discard any error output from (a possibly
> nonexisting) "chmod", we'll fail on the subsequent "rm -rf"
> anyway.
> 
> The lack of &&-chaining between the "chmod" and "rm -rf" is
> intentional, if we fail the first "rm -rf", can't chmod, but then
> succeed the second time around that's what we were hoping for. We just
> want to nuke the directory, not carry forward every possible error
> code or error message.
> 
> 1. dbda967684d (t0004 (unwritable files): simplify error handling,
>    2010-09-06)
> 2. 5a4a088add3 (test-lib: do not remove trash_directory if called with
>    --debug, 2008-08-21)
> 3. b586744a864 (test: skip clean-up when running under --immediate
>    mode, 2011-06-27)
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/test-lib.sh | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index aa1ad8180ed..706ce0cdeb9 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1210,10 +1210,10 @@ test_done () {
>  			error "Tests passed but trash directory already removed before test cleanup; aborting"
>  
>  			cd "$TRASH_DIRECTORY/.." &&
> -			rm -fr "$TRASH_DIRECTORY" || {
> +			remove_trash_directory "$TRASH_DIRECTORY" || {
>  				# try again in a bit
>  				sleep 5;
> -				rm -fr "$TRASH_DIRECTORY"
> +				remove_trash_directory "$TRASH_DIRECTORY"
>  			} ||
>  			error "Tests passed but test cleanup failed; aborting"
>  		fi
> @@ -1370,6 +1370,18 @@ then
>  	exit 1
>  fi
>  
> +# Try really hard to clean up our mess
> +remove_trash_directory() {
> +	dir="$1"
> +	if ! rm -rf "$dir"
> +	then
> +		say_color info >&3 "Failed to remove trash directory, trying to re-chmod it first..."

Is this message really necessary?  I don't think it is, and there was
no similar message in the previous hunk around that 'sleep 5' either.

> +		chmod -R u+w "$dir" 2>/dev/null
> +		rm -rf "$dir"
> +	fi
> +	! test -d "$dir"
> +}
> +
>  # Are we running this test at all?
>  remove_trash=
>  this_test=${0##*/}
> @@ -1388,7 +1400,7 @@ GNUPGHOME="$HOME/gnupg-home-not-used"
>  export HOME GNUPGHOME USER_HOME
>  
>  # Test repository
> -rm -fr "$TRASH_DIRECTORY" || {
> +remove_trash_directory "$TRASH_DIRECTORY" || {
>  	GIT_EXIT_OK=t
>  	echo >&5 "FATAL: Cannot prepare test area"
>  	exit 1
> -- 
> 2.33.0.1492.gcc3b81fc59a
>
Junio C Hamano Oct. 10, 2021, 10:14 p.m. UTC | #2
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> This can be demonstrated as e.g. (output snipped for less verbosity):
>
>     $ ./t0004-unwritable.sh --run=3 --immediate
>     ok 1 # skip setup (--run)
>     ok 2 # skip write-tree should notice unwritable repository (--run)
>     not ok 3 - commit should notice unwritable repository
>     [...]
>     $ ./t0004-unwritable.sh --run=3 --immediate
>     rm: cannot remove '[...]/trash directory.t0004-unwritable/.git/objects/info': Permission denied
>     FATAL: Cannot prepare test area
>     [...]
>
> I.e. if any of its tests failed, and the tests were being run under
> "--debug"[2] or "--immediate"[3] (which was introduced after [1]) we
> wouldn't re-chmod the object directory. We'd then fail on the next run
> since the test setup couldn't remove the trash files.

OK, the first thing these scripts do is to create $TRASH_DIRECTORY
for their own use, and that is what fails (hence "Cannot prepare").

And making sure we try harder to prepare would be a good thing.
Given that understanding of the motivation... 

> @@ -1210,10 +1210,10 @@ test_done () {

...it is puzzling why we are touching the code that happens _after_
a test is run.  If our philosophy is to leave the mess after a test
fails so that the debugging person can take a look without
contaminating the forensics data, but blindly and forcibly clean up
before starting a new round, reusing the "do more than usual to
remove" logic meant to be used for the latter here feels counter
productive and also a bit counter intuitive.

>  			error "Tests passed but trash directory already removed before test cleanup; aborting"
>  
>  			cd "$TRASH_DIRECTORY/.." &&
> -			rm -fr "$TRASH_DIRECTORY" || {
> +			remove_trash_directory "$TRASH_DIRECTORY" || {
>  				# try again in a bit
>  				sleep 5;
> -				rm -fr "$TRASH_DIRECTORY"
> +				remove_trash_directory "$TRASH_DIRECTORY"
>  			} ||
>  			error "Tests passed but test cleanup failed; aborting"
>  		fi



> +# Try really hard to clean up our mess
> +remove_trash_directory() {
> +	dir="$1"
> +	if ! rm -rf "$dir"
> +	then
> +		say_color info >&3 "Failed to remove trash directory, trying to re-chmod it first..."
> +		chmod -R u+w "$dir" 2>/dev/null

If a test lost searchable bit from directories, "u+wx" may be
necessary to clean fully, no?

> +		rm -rf "$dir"
> +	fi
> +	! test -d "$dir"
> +}
> +

On the other hand...

>  # Are we running this test at all?
>  remove_trash=
>  this_test=${0##*/}
> @@ -1388,7 +1400,7 @@ GNUPGHOME="$HOME/gnupg-home-not-used"
>  export HOME GNUPGHOME USER_HOME
>  
>  # Test repository
> -rm -fr "$TRASH_DIRECTORY" || {
> +remove_trash_directory "$TRASH_DIRECTORY" || {

... this one does make sense and matches what the proposed log
message sold us the change as.

>  	GIT_EXIT_OK=t
>  	echo >&5 "FATAL: Cannot prepare test area"
>  	exit 1

Thanks.
Junio C Hamano Oct. 11, 2021, 4:34 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>> +		chmod -R u+w "$dir" 2>/dev/null
>
> If a test lost searchable bit from directories, "u+wx" may be
> necessary to clean fully, no?

Sorry, but I was stupid.

If we were to worry about losing the searchable bit, then I should
considered the possibility that we may also lose the readable bit,
too.
diff mbox series

Patch

diff --git a/t/test-lib.sh b/t/test-lib.sh
index aa1ad8180ed..706ce0cdeb9 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1210,10 +1210,10 @@  test_done () {
 			error "Tests passed but trash directory already removed before test cleanup; aborting"
 
 			cd "$TRASH_DIRECTORY/.." &&
-			rm -fr "$TRASH_DIRECTORY" || {
+			remove_trash_directory "$TRASH_DIRECTORY" || {
 				# try again in a bit
 				sleep 5;
-				rm -fr "$TRASH_DIRECTORY"
+				remove_trash_directory "$TRASH_DIRECTORY"
 			} ||
 			error "Tests passed but test cleanup failed; aborting"
 		fi
@@ -1370,6 +1370,18 @@  then
 	exit 1
 fi
 
+# Try really hard to clean up our mess
+remove_trash_directory() {
+	dir="$1"
+	if ! rm -rf "$dir"
+	then
+		say_color info >&3 "Failed to remove trash directory, trying to re-chmod it first..."
+		chmod -R u+w "$dir" 2>/dev/null
+		rm -rf "$dir"
+	fi
+	! test -d "$dir"
+}
+
 # Are we running this test at all?
 remove_trash=
 this_test=${0##*/}
@@ -1388,7 +1400,7 @@  GNUPGHOME="$HOME/gnupg-home-not-used"
 export HOME GNUPGHOME USER_HOME
 
 # Test repository
-rm -fr "$TRASH_DIRECTORY" || {
+remove_trash_directory "$TRASH_DIRECTORY" || {
 	GIT_EXIT_OK=t
 	echo >&5 "FATAL: Cannot prepare test area"
 	exit 1