diff mbox series

[2/2] diff tests: rewrite flakyness-causing test "aid"

Message ID patch-2.2-c2cb52b6605-20210413T122645Z-avarab@gmail.com (mailing list archive)
State New
Headers show
Series diff tests: un-flaky and post-gettext-poison cleanup | expand

Commit Message

Ævar Arnfjörð Bjarmason April 13, 2021, 12:28 p.m. UTC
If a new test is added to this "while read magic cmd" test facility
added in 3c2f75b590c (t4013: add tests for diff/log family output
options., 2006-06-26) but no test file is added it'll fail the first
time, but then succeed on subsequent runs as a new file has been added
in t4013.

Let's accomplish the same aim in way that doesn't cause subsequent
test runs to succeed. If we can't find the file we'll BUG out, and
suggest to the developer that they copy our "expect.new" file over,
unlike the previous "expect" file this won't be picked up on
subsequent runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4013-diff-various.sh | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

Comments

Matheus Tavares Bernardino April 13, 2021, 2:44 p.m. UTC | #1
On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> If a new test is added to this "while read magic cmd" test facility
> added in 3c2f75b590c (t4013: add tests for diff/log family output
> options., 2006-06-26) but no test file is added it'll fail the first
> time, but then succeed on subsequent runs as a new file has been added
> in t4013.
>
> Let's accomplish the same aim in way that doesn't cause subsequent

s/in way/in a way/ ?

> test runs to succeed. If we can't find the file we'll BUG out, and
> suggest to the developer that they copy our "expect.new" file over,
> unlike the previous "expect" file this won't be picked up on
> subsequent runs.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t4013-diff-various.sh | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 67f6411aff9..228ff100c61 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -200,10 +200,12 @@ do
>         esac
>         test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
>         pfx=$(printf "%04d" $test_count)
> -       expect="$TEST_DIRECTORY/t4013/diff.$test"
> +       expect_relative="t4013/diff.$test"
> +       expect="$TEST_DIRECTORY/$expect_relative"
>         actual="$pfx-diff.$test"
>
>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
> +               test_when_finished "rm $actual" &&

Nit: before these two patches, "$actual" was only removed when the
test succeeded. So, in case of failure, the failed output files would
still be there for debugging. It might be interesting to keep this
behavior and only remove "$actual" at the end of the test.

>                 {
>                         echo "$ git $cmd"
>                         case "$magic" in
> @@ -216,16 +218,19 @@ do
>                             -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/"
>                         echo "\$"
>                 } >"$actual" &&
> -               if test -f "$expect"
> +
> +               if ! test -f "$expect"
>                 then
> -                       process_diffs "$actual" >actual &&
> -                       process_diffs "$expect" >expect &&
> -                       test_cmp expect actual
> -               else
> -                       # this is to help developing new tests.
> -                       cp "$actual" "$expect"
> -                       false
> -               fi
> +                       expect_new="$expect.new" &&
> +                       cp "$actual" "$expect_new" &&
> +                       BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?"
> +               fi &&
> +
> +               test_when_finished "rm actual" &&
> +               process_diffs "$actual" >actual &&
> +               test_when_finished "rm expect" &&
> +               process_diffs "$expect" >expect &&
> +               test_cmp expect actual
>         '
>  done <<\EOF
>  diff-tree initial

The rest LGTM, thanks.
Ævar Arnfjörð Bjarmason April 13, 2021, 7:01 p.m. UTC | #2
On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:

> On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> If a new test is added to this "while read magic cmd" test facility
>> added in 3c2f75b590c (t4013: add tests for diff/log family output
>> options., 2006-06-26) but no test file is added it'll fail the first
>> time, but then succeed on subsequent runs as a new file has been added
>> in t4013.
>>
>> Let's accomplish the same aim in way that doesn't cause subsequent
>
> s/in way/in a way/ ?

*nod*

>> test runs to succeed. If we can't find the file we'll BUG out, and
>> suggest to the developer that they copy our "expect.new" file over,
>> unlike the previous "expect" file this won't be picked up on
>> subsequent runs.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  t/t4013-diff-various.sh | 25 +++++++++++++++----------
>>  1 file changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
>> index 67f6411aff9..228ff100c61 100755
>> --- a/t/t4013-diff-various.sh
>> +++ b/t/t4013-diff-various.sh
>> @@ -200,10 +200,12 @@ do
>>         esac
>>         test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
>>         pfx=$(printf "%04d" $test_count)
>> -       expect="$TEST_DIRECTORY/t4013/diff.$test"
>> +       expect_relative="t4013/diff.$test"
>> +       expect="$TEST_DIRECTORY/$expect_relative"
>>         actual="$pfx-diff.$test"
>>
>>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
>> +               test_when_finished "rm $actual" &&
>
> Nit: before these two patches, "$actual" was only removed when the
> test succeeded. So, in case of failure, the failed output files would
> still be there for debugging. It might be interesting to keep this
> behavior and only remove "$actual" at the end of the test.

Either I'm missing something or you are, that's how test_when_finished
works.

It's skipped under e.g. "--immediate --debug". See b586744a864 (test:
skip clean-up when running under --immediate mode, 2011-06-27)

Maybe there's some edge case where we'd like to keep the files that it's
not covering, but then we should patch it to do the right thing, not use
manual "rm" at the end instead.

>>                 {
>>                         echo "$ git $cmd"
>>                         case "$magic" in
>> @@ -216,16 +218,19 @@ do
>>                             -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/"
>>                         echo "\$"
>>                 } >"$actual" &&
>> -               if test -f "$expect"
>> +
>> +               if ! test -f "$expect"
>>                 then
>> -                       process_diffs "$actual" >actual &&
>> -                       process_diffs "$expect" >expect &&
>> -                       test_cmp expect actual
>> -               else
>> -                       # this is to help developing new tests.
>> -                       cp "$actual" "$expect"
>> -                       false
>> -               fi
>> +                       expect_new="$expect.new" &&
>> +                       cp "$actual" "$expect_new" &&
>> +                       BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?"
>> +               fi &&
>> +
>> +               test_when_finished "rm actual" &&
>> +               process_diffs "$actual" >actual &&
>> +               test_when_finished "rm expect" &&
>> +               process_diffs "$expect" >expect &&
>> +               test_cmp expect actual
>>         '
>>  done <<\EOF
>>  diff-tree initial
>
> The rest LGTM, thanks.

Thanks a lot for the review!
Matheus Tavares Bernardino April 13, 2021, 7:55 p.m. UTC | #3
On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:
>
> > On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>  t/t4013-diff-various.sh | 25 +++++++++++++++----------
> >>  1 file changed, 15 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> >> index 67f6411aff9..228ff100c61 100755
> >> --- a/t/t4013-diff-various.sh
> >> +++ b/t/t4013-diff-various.sh
> >> @@ -200,10 +200,12 @@ do
> >>         esac
> >>         test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
> >>         pfx=$(printf "%04d" $test_count)
> >> -       expect="$TEST_DIRECTORY/t4013/diff.$test"
> >> +       expect_relative="t4013/diff.$test"
> >> +       expect="$TEST_DIRECTORY/$expect_relative"
> >>         actual="$pfx-diff.$test"
> >>
> >>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
> >> +               test_when_finished "rm $actual" &&
> >
> > Nit: before these two patches, "$actual" was only removed when the
> > test succeeded. So, in case of failure, the failed output files would
> > still be there for debugging. It might be interesting to keep this
> > behavior and only remove "$actual" at the end of the test.
>
> Either I'm missing something or you are, that's how test_when_finished
> works.
>
> It's skipped under e.g. "--immediate --debug". See b586744a864 (test:
> skip clean-up when running under --immediate mode, 2011-06-27)

I was mostly thinking about the `artifacts` zip we get from our CI
when a test fails. I find the final trash dir quite useful for some
post-mortem analysis, especially to debug WIP tests that only fail
occasionally or test failures on OSes I don't have quick access to.
Junio C Hamano April 13, 2021, 9:55 p.m. UTC | #4
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> > Nit: before these two patches, "$actual" was only removed when the
>> > test succeeded. So, in case of failure, the failed output files would
>> > still be there for debugging. It might be interesting to keep this
>> > behavior and only remove "$actual" at the end of the test.
>>
>> Either I'm missing something or you are, that's how test_when_finished
>> works.
>>
>> It's skipped under e.g. "--immediate --debug". See b586744a864 (test:
>> skip clean-up when running under --immediate mode, 2011-06-27)
>
> I was mostly thinking about the `artifacts` zip we get from our CI
> when a test fails. I find the final trash dir quite useful for some
> post-mortem analysis, especially to debug WIP tests that only fail
> occasionally or test failures on OSes I don't have quick access to.

Yeah, without "--immediate" we do run the clean-up tasks registered
with test_when_finished, so it may be a problem.
Ævar Arnfjörð Bjarmason April 14, 2021, 6:22 a.m. UTC | #5
On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:

> On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:
>>
>> > On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>  t/t4013-diff-various.sh | 25 +++++++++++++++----------
>> >>  1 file changed, 15 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
>> >> index 67f6411aff9..228ff100c61 100755
>> >> --- a/t/t4013-diff-various.sh
>> >> +++ b/t/t4013-diff-various.sh
>> >> @@ -200,10 +200,12 @@ do
>> >>         esac
>> >>         test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
>> >>         pfx=$(printf "%04d" $test_count)
>> >> -       expect="$TEST_DIRECTORY/t4013/diff.$test"
>> >> +       expect_relative="t4013/diff.$test"
>> >> +       expect="$TEST_DIRECTORY/$expect_relative"
>> >>         actual="$pfx-diff.$test"
>> >>
>> >>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
>> >> +               test_when_finished "rm $actual" &&
>> >
>> > Nit: before these two patches, "$actual" was only removed when the
>> > test succeeded. So, in case of failure, the failed output files would
>> > still be there for debugging. It might be interesting to keep this
>> > behavior and only remove "$actual" at the end of the test.
>>
>> Either I'm missing something or you are, that's how test_when_finished
>> works.
>>
>> It's skipped under e.g. "--immediate --debug". See b586744a864 (test:
>> skip clean-up when running under --immediate mode, 2011-06-27)
>
> I was mostly thinking about the `artifacts` zip we get from our CI
> when a test fails. I find the final trash dir quite useful for some
> post-mortem analysis, especially to debug WIP tests that only fail
> occasionally or test failures on OSes I don't have quick access to.

Ah, yes that's a problem we should solve, but I think we should not put
off migration to test_when_finished because of that.

The whole reason we use it is to clean up the work area for the next
test.

Thus if we do:

    git something >expected &&
    test_cmp expected actual &&
    rm expected actual

And "git something" segfaults it's only dumb luck that subsequent tests
don't fail in non-obvious ways due to the files being left behind.

If we could rely on this in general we could make everything under
test_when_finished a noop, but I think you'll find that'll fail a lot of
things in the test suite if we do that.

But the problem you cite is legitimate, but we should solve it with
something like:

 1. If we fail tests 58 and 67 out of 100 we should/could run these
    again at the end under some more exhaustive debugging mode where
    we'd save away all the intermediate files.

 2. Or, and test_when_finished helps here, we could set $PATH when it
    runs to a directory with a custom "rm", which would save away the
    temporary files, expecting that we'll tar them up if any of the
    tests whose files we saved failed (we'd delete the other ones).

 3. Similarly, wrap "rm" in some "cat and rm" for common cases like
    actual/expected files, and improve the verbose output to some
    "verbose if fails".
Junio C Hamano April 14, 2021, 6:35 a.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:
> ...
>>> >>         actual="$pfx-diff.$test"
>>> >>
>>> >>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
>>> >> +               test_when_finished "rm $actual" &&
>>> >
>>> > Nit: before these two patches, "$actual" was only removed when the
>>> > test succeeded. So, in case of failure, the failed output files would
>>> > still be there for debugging. It might be interesting to keep this
>>> > behavior and only remove "$actual" at the end of the test.
> ...
> Ah, yes that's a problem we should solve, but I think we should not put
> off migration to test_when_finished because of that.
>
> The whole reason we use it is to clean up the work area for the next
> test.
>
> Thus if we do:
>
>     git something >expected &&
>     test_cmp expected actual &&
>     rm expected actual

Isn't it a poor example to use to argue for your particular change,
where $actual in the original is designed to be unique among tests,
in order to ensure that $actual files left after test pieces fail
would not interfere with the tests that come later?  IOW, there is
not a reason to remove $actual until the end of the test sequence,
is there?
Ævar Arnfjörð Bjarmason April 14, 2021, 7:38 a.m. UTC | #7
On Wed, Apr 14 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote:
>> ...
>>>> >>         actual="$pfx-diff.$test"
>>>> >>
>>>> >>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
>>>> >> +               test_when_finished "rm $actual" &&
>>>> >
>>>> > Nit: before these two patches, "$actual" was only removed when the
>>>> > test succeeded. So, in case of failure, the failed output files would
>>>> > still be there for debugging. It might be interesting to keep this
>>>> > behavior and only remove "$actual" at the end of the test.
>> ...
>> Ah, yes that's a problem we should solve, but I think we should not put
>> off migration to test_when_finished because of that.
>>
>> The whole reason we use it is to clean up the work area for the next
>> test.
>>
>> Thus if we do:
>>
>>     git something >expected &&
>>     test_cmp expected actual &&
>>     rm expected actual
>
> Isn't it a poor example to use to argue for your particular change,
> where $actual in the original is designed to be unique among tests,
> in order to ensure that $actual files left after test pieces fail
> would not interfere with the tests that come later?  IOW, there is
> not a reason to remove $actual until the end of the test sequence,
> is there?

Not really, because you needed to read the rest of the test file to come
to that conclusion.

The point of using a helper that guarantees cleanup such as
test_when_finished or test_config over manual "git config" or "git rm"
isn't that we can prove that we need it because a later test needs the
cleanup, but that anyone can add new tests or functionality without
having to worry about cleaning up the existing trash directory.

So yes, it's not needed here, but that's only because we know the rest
of the tests don't have e.g. a test that does a:

    for file in t4013/*

Anyway, that's only half of the example I cited for why we should use
test_when_finished in general, the other half was noting that because
we've semantically annotated cleanup tasks we can smartly handle those
for debugging, e.g. stash the "removed" files away, and if the test
fails present them to the user.
Junio C Hamano April 14, 2021, 7:53 a.m. UTC | #8
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>> Thus if we do:
>>>
>>>     git something >expected &&
>>>     test_cmp expected actual &&
>>>     rm expected actual
>>
>> Isn't it a poor example to use to argue for your particular change,
>> where $actual in the original is designed to be unique among tests,
>> in order to ensure that $actual files left after test pieces fail
>> would not interfere with the tests that come later?  IOW, there is
>> not a reason to remove $actual until the end of the test sequence,
>> is there?
>
> Not really, because you needed to read the rest of the test file to come
> to that conclusion.
>
> The point of using a helper that guarantees cleanup such as
> test_when_finished or test_config over manual "git config" or "git rm"
> isn't that we can prove that we need it because a later test needs the
> cleanup, but that anyone can add new tests or functionality without
> having to worry about cleaning up the existing trash directory.
>
> So yes, it's not needed here, but that's only because we know the rest
> of the tests don't have e.g. a test that does a:

In this particular case, $actual files are designed to be left
behind for failed test pieces, so that the tester can come back and
inspect them.  I probably should have said it a bit more strongly
than "there is not a reason to remove".  You SHOULD NOT remove and
that is why we had "check and then remove only upon success" there,
instead of test_when_finished.  We want them left for and only for
failing test pieces.

Please do not advocate for and encourage newbies who would be
reading the discussion from sidelines to use test_when_finished out
of dogmatic principle without thinking.  Even though there are valid
cases where test_when_finished is the perfect fit, in this
particular case, use of it is a clear regression.

Thanks.
Ævar Arnfjörð Bjarmason April 14, 2021, 10:08 a.m. UTC | #9
On Wed, Apr 14 2021, Junio C Hamano wrote:

["tl;dr" below]:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>>> Thus if we do:
>>>>
>>>>     git something >expected &&
>>>>     test_cmp expected actual &&
>>>>     rm expected actual
>>>
>>> Isn't it a poor example to use to argue for your particular change,
>>> where $actual in the original is designed to be unique among tests,
>>> in order to ensure that $actual files left after test pieces fail
>>> would not interfere with the tests that come later?  IOW, there is
>>> not a reason to remove $actual until the end of the test sequence,
>>> is there?
>>
>> Not really, because you needed to read the rest of the test file to come
>> to that conclusion.
>>
>> The point of using a helper that guarantees cleanup such as
>> test_when_finished or test_config over manual "git config" or "git rm"
>> isn't that we can prove that we need it because a later test needs the
>> cleanup, but that anyone can add new tests or functionality without
>> having to worry about cleaning up the existing trash directory.
>>
>> So yes, it's not needed here, but that's only because we know the rest
>> of the tests don't have e.g. a test that does a:
>
> In this particular case, $actual files are designed to be left
> behind for failed test pieces, so that the tester can come back and
> inspect them.  I probably should have said it a bit more strongly
> than "there is not a reason to remove".  You SHOULD NOT remove and
> that is why we had "check and then remove only upon success" there,
> instead of test_when_finished.  We want them left for and only for
> failing test pieces.

Yes, it's clear that it's designed to do that. I'm not disagreeing on
the intent of your commit in 2006 to set it up this way.

What I am saying is that it's incompatible to have:

 1. Failing tests
 2. Not removing scratch files that would otherwise be removed

And:

 3. Knowing that the rest of the tests pass in the case of #1 without
    reading them all.

Hence the suggestion that we should use test_when_finished without
exception for such patterns.

Because consistently using the helper would allow us to smartly get #1
and #2 without #3, i.e. some "copy them for later analysis" as suggested
upthread in:
http://lore.kernel.org/git/874kg92xn0.fsf@evledraar.gmail.com

> Please do not advocate for and encourage newbies who would be
> reading the discussion from sidelines to use test_when_finished out
> of dogmatic principle without thinking.

Is Matheus the newbie here? I think he's contributed enough to form his
own opinion.

I think it's clear to anyone else reading from the sidelines that we're
having some informed disagreement about a finer point of patterns in the
test suite.

I doubt either one of us is likely to have much of an impact on newbies
reading from the sidelines.

In any case, I don't see how you're able to read this thread and come to
the conclusion that I'm proposing the use of test_when_finished out of
"[some] dogmatic principle".

I'm not. I'm proposing to use it because I think it makes sense, and I
think the reasons you've noted for avoiding it in this case have more
downsides than upsites, as noted in the #1-3 examples above.

> Even though there are valid cases where test_when_finished is the
> perfect fit, in this particular case, use of it is a clear regression.

Well, I disagree with you.

I think even if the patch I was proposing was a mere conversion to
test_when_finished as would be typical for most newly written tests,
i.e. a diff like this:
	
	diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
	index 6cca8b84a6b..d0031aa0f7b 100755
	--- a/t/t4013-diff-various.sh
	+++ b/t/t4013-diff-various.sh
	@@ -204,6 +204,7 @@ do
	 	actual="$pfx-diff.$test"
	 
	 	test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
	+		test_when_finished "rm \"$actual\"" &&
	 		{
	 			echo "$ git $cmd"
	 			case "$magic" in
	@@ -216,22 +217,11 @@ do
	 			    -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/"
	 			echo "\$"
	 		} >"$actual" &&
	-		if test -f "$expect"
	-		then
	-			process_diffs "$actual" >actual &&
	-			process_diffs "$expect" >expect &&
	-			case $cmd in
	-			*format-patch* | *-stat*)
	-				test_cmp expect actual;;
	-			*)
	-				test_cmp expect actual;;
	-			esac &&
	-			rm -f "$actual" actual expect
	-		else
	-			# this is to help developing new tests.
	-			cp "$actual" "$expect"
	-			false
	-		fi
	+		test_when_finished "rm actual" &&
	+		process_diffs "$actual" >actual &&
	+		test_when_finished "rm expect" &&
	+		process_diffs "$expect" >expect &&
	+		test_cmp expect actual
	 	'
	 done <<\EOF
	 diff-tree initial

It's a clear improvement, because, on current master, emulating the case
when you add a new test-case:

    $ rm t4013/diff.log_-GF_-p_--pickaxe-all_master

Now run it once:

    $ ./t4013-diff-various.sh
    [...]
    # failed 1 among 199 test(s)

And then to debug it:

    $ ./t4013-diff-various.sh -vixd
    [...]
    # passed all 199 test(s)

So because we copied the file around the only get the failure the first
time around.

You might rightly say that we could have narrowly just fixed that
particular bug and kept the old "copy and return false", so something
like:
	
	diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
	index 6cca8b84a6b..6f0f3c7f53c 100755
	--- a/t/t4013-diff-various.sh
	+++ b/t/t4013-diff-various.sh
	@@ -229,7 +229,7 @@ do
	 			rm -f "$actual" actual expect
	 		else
	 			# this is to help developing new tests.
	-			cp "$actual" "$expect"
	+			cp "$actual" "$expect.MAYBE-USE-THIS"
	 			false
	 		fi
	 	'

But as noted above that gets us into needing to worry about cascading
failure. I.e. any new test added later in the file needs to think about
"what if the t4013 directory is in this state?".

We also document what we'll do and how we'll do it in t/README in this
case:
	
	--immediate::
		This causes the test to immediately exit upon the first
		failed test. Cleanup commands requested with
		test_when_finished are not executed if the test failed,
		in order to keep the state for inspection by the tester
		to diagnose the bug.

So it's much more obvious and consistent to have a failure that ends on
the test_cmp and being able to see the test_when_finished for the
intermediate files in the trace output.

The hypothetical newbie might have debugged such a pattern already, but
not having it end in a special-case used only for this test:

    [...]
    + test -f /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master
    + cp 0096-diff.log_-GF_-p_--pickaxe-all_master /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master.MAYBE-USE-THIS
    + false

In any case, neither one of those is the patch I'm suggesting upthread.

I am suggesting not just to use test_when_finished, but that we should
use BUG here. Since not having the file itself is a bug in git's test
suite, not a mere failure of the test case. The test will never pass if
the file doesn't exist.

So even if I agreed with you that the more narrow migration to
test_when_finished would be a regression, I don't see how you think
*this* patch could be a regression.

You seem to just be focusing on the test_when_finished part of it, but
that's not all it's doing. Since we also use BUG the output is now:

    $ ./t4013-diff-various.sh 
    [...]
    error: bug in the test script: Have no "t4013/diff.log_-GF_-p_--pickaxe-all_master", new test? [...]

I.e. whether we used "rm" or "cp" or "test_when_finished 'rm [...]'"
earlier has become irrelevant to achieve the states aims in 3c2f75b590c
(t4013: add tests for diff/log family output options., 2006-06-26).

It doesn't matter how we're removing/copying etc. the file anymore,
that's no longer how we're helping the user debug the test.

tl;dr:

In any case, I think it's best to just drop this series. I wrote the
above more as a synopsis of what I belive we should be doing in general,
I do have some patches queued up to address the general debug-ability of
our test output (e.g. for CI).

For that end-goal having e.g. test_when_finished (mostly) consistently
used would help a lot, but not using it in any one test would be
fine. This is just one such case, and something I thought wouldn't be a
controversial patch.

I think at this point me trying to gleaming entrails of what you want &
you responding back is going to take both of us much more time than
replacing this series with a patch you'd be happy with, and this wasn't
something I cared about more than a one-off.
Junio C Hamano April 15, 2021, 8:14 a.m. UTC | #10
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> What I am saying is that it's incompatible to have:
>
>  1. Failing tests
>  2. Not removing scratch files that would otherwise be removed
>
> And:
>
>  3. Knowing that the rest of the tests pass in the case of #1 without
>     reading them all.
>
> Hence the suggestion that we should use test_when_finished without
> exception for such patterns.

I disagree with the above; t4013's "read magic cmd" part is designed
to be independent from each other; I do not think you need to read
all of the parts enclosed in << ... EOF to understand that.

In short,

 * "test_when_finished 'rm ...'" is a good tool to ensure something
   gets removed no matter what else happens in the same test.  Since
   it does not run the clean-up under "-i", it can even be used on
   files that would be useful in diagnosing failures.  But under
   "-d", it does clean-up to avoid affecting the following test.

 * $actual was made unique so that even under "-d", a failing test
   would not negatively affect the subsequent ones.  Removing it for
   failure cases is actively wrong, so use of test_when_finished,
   which may be an expedient way to remove artifact that would
   negatively affect later test pieces, does not apply --- existing
   code is doing better than test_when_finished can offer, and
   replacing it with test_when_finished is a regression.

 * And the most important part: the unnecessary removal of $actual
   was not even part of the "flakyness-causing" bug you started this
   topic to fix anyway.  We should just remove the regression caused
   by unnecessary use of test_when_finished and focus on the "don't
   place the actual output from a brand new test under the file used
   for the expectation the next time---instead place it under
   temporary file and call for attention" part, which was the real
   improvement.

>> Please do not advocate for and encourage newbies who would be
>> reading the discussion from sidelines to use test_when_finished out
>> of dogmatic principle without thinking.
>
> Is Matheus the newbie here? I think he's contributed enough to form his
> own opinion.

No, those "reading the discussion from sidelines" are not on To: or
Cc:, but are eager to learn by reading what is available to git@
subscribers (including the lore archive).  I do not want those of us
whose name appear often in the list archive to be sending a wrong
signal to them.

> In any case, I think it's best to just drop this series.

That is a sad and wrong conclusion.  We should just realize what we
really wanted to fix in the first place and keep the improvement;
otherwise all the review time was wasted.

And next time, I'd suggest you not to spend too many bytes on "while
at it".  Clear and obvious clean-up while the tree and the project
is otherwise quiescent is very much no-brainer welcome, but
otherwise...

Thanks.
Ævar Arnfjörð Bjarmason April 15, 2021, 1:21 p.m. UTC | #11
On Thu, Apr 15 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> What I am saying is that it's incompatible to have:
>>
>>  1. Failing tests
>>  2. Not removing scratch files that would otherwise be removed
>>
>> And:
>>
>>  3. Knowing that the rest of the tests pass in the case of #1 without
>>     reading them all.
>>
>> Hence the suggestion that we should use test_when_finished without
>> exception for such patterns.
>
> I disagree with the above; t4013's "read magic cmd" part is designed
> to be independent from each other; I do not think you need to read
> all of the parts enclosed in << ... EOF to understand that.
>
> In short,
>
>  * "test_when_finished 'rm ...'" is a good tool to ensure something
>    gets removed no matter what else happens in the same test.  Since
>    it does not run the clean-up under "-i", it can even be used on
>    files that would be useful in diagnosing failures.  But under
>    "-d", it does clean-up to avoid affecting the following test.
>
>  * $actual was made unique so that even under "-d", a failing test
>    would not negatively affect the subsequent ones. [...]

I don't mind disagreeing with you, but I do feel like we're just talking
past each other, either that or we've got a different understanding of
"negatively affect the subsequent ones".

I'm saying that yes I agree that *right now* it doesn't negatively
affect any of the subsequent ones, but that you can only know that if
you need all the tests in the file past the one you're modifying.

I.e. imagine if we added, obviously contrived, but a stand-in for "we
expect consistent state":
	
	diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
	index 6cca8b84a6b..a326968e8f3 100755
	--- a/t/t4013-diff-various.sh
	+++ b/t/t4013-diff-various.sh
	@@ -526,4 +526,9 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
	 	test_i18ngrep "invalid regex given to -I: " error
	 '
	 
	+test_expect_success 'stuff' '
	+	find * -type f >actual &&
	+	test_line_count = 9 actual
	+'
	+
	 test_done

Now run it, and you'll pass them all, now break one of the earlier tests
with:

    >t4013/diff.log_-GF_-p_--pickaxe-all_master

You'll fail two tests (the expected one, and the new one), instead of
just one, which is purely an artifact of the earlier test's cleanup /
state being different as a result of whether it passes or not.

With my patch you'll fail just the one earlier test, because we do the
same cleanup no matter whether we failed midway through it or not.

So yes, while this small series amounts to re-arranging some deck chairs
on the Titanic I do think not being able to rely on consistent state is
a big anti-pattern in the test suite in general.

It's a big part of the reason for why you can't reliably use --run=N or
GIT_SKIP_TESTS (especially in cases of partial failures) to selectively
skip things without what amounts to bespoke analysis of each test you'd
like to run a subset of.

Now, of course test_when_finished won't get us there by itself, we'd
also need consistently use some "test_expect_setup" to clearly delineate
"setup" and other "this must run" state changes from "just consumes
existing state and resets". But consistently writing new/modified tests
as "just consumes existing state and resets" will slowly get us there.

>    [...] Removing it for
>    failure cases is actively wrong, so use of test_when_finished,
>    which may be an expedient way to remove artifact that would
>    negatively affect later test pieces, does not apply --- existing
>    code is doing better than test_when_finished can offer, and
>    replacing it with test_when_finished is a regression.

Aside from whether we agree or disagree on whether test_when_finished
should be consistently used to clean up state I genuinely still don't
see how the sum total of replacing the current "rm && cp" with
"test_when_finished && BUG" leaves us worse off.

Are you thinking of the case where someone wants to add N new lines to
the "while read magic cmd" loop?

That's the only case I can think of that would arguably be worse, but I
think it's far outweighed by the more obvious failure mode, and a "git
blame" on the lines it consumes shows that additions are very occasional
(and not in large batches).

Other than that, under e.g. -vixd having the test suite stop on the
specific line that failed (the test_cmp, not a "false" after some "cp")
is how most other tests works, and I think makes things much more
readable and understandable at a glance.

>  * And the most important part: the unnecessary removal of $actual
>    was not even part of the "flakyness-causing" bug you started this
>    topic to fix anyway.  We should just remove the regression caused
>    by unnecessary use of test_when_finished and focus on the "don't
>    place the actual output from a brand new test under the file used
>    for the expectation the next time---instead place it under
>    temporary file and call for attention" part, which was the real
>    improvement.

I see your point and agree in theory that the "we write into the
t/t4013/ directory" and "we inconsistently clear up the trash directory
on OK/NOK" are in principle different issues.

But in this case we always removed or copied "$actual" away from its
original name before, we do the same now (just more reliably, and in a
way that doesn't cause flakyness in the "cp" case).

So I don't really see how to split up those two arguably different
changes in a way that would make sense.

The only case where "$actual" was left before was (I think) a clear of
leaking state unnecessarily outside of --immediate.

>>> Please do not advocate for and encourage newbies who would be
>>> reading the discussion from sidelines to use test_when_finished out
>>> of dogmatic principle without thinking.
>>
>> Is Matheus the newbie here? I think he's contributed enough to form his
>> own opinion.
>
> No, those "reading the discussion from sidelines" are not on To: or
> Cc:, but are eager to learn by reading what is available to git@
> subscribers (including the lore archive).  I do not want those of us
> whose name appear often in the list archive to be sending a wrong
> signal to them.

I'll endeavor to make it clear that my opinion is just my opinion
etc. in the future, I thought it was clear from the context :)

>> In any case, I think it's best to just drop this series.
>
> That is a sad and wrong conclusion.  We should just realize what we
> really wanted to fix in the first place and keep the improvement;
> otherwise all the review time was wasted.

To be clear that's not some "I can't have the change I want so I'm
taking my toys and going home" tantrum. I don't mind making changes in
response to reviews etc.

As noted above I'm genuinely still unable to see how I'd selectively
untangle the change I made in a way that both wouldn't either fix bugs
in the test, or introduce new bugs while I was at it, and that I'd be
comfortable putting my name behind.

Hence the suggestion that since we're already we past the point of just
having come up with two independent approaches to this in terms of time
investment, that doing so would be a better thing overall.

I do think (and would be willing to work on) that t/README should have
some summary of best practices for new/updated tests that we should be
striving for. Looking at our "test_when_finished" there's at least tow
independent bugs[1] in it, so ther's some room for improvement.

1. It creates a ref with echo, but removes it with update-ref -d, which
   e.g. couldn't delete a corrupt ref (if rev-parse is buggy); and more
   importantly if rev-parse dies midway through we're left with a
   corrupt repo, the "test_when_finished ... update-ref -d" should come
   *first*.
SZEDER Gábor April 15, 2021, 9:26 p.m. UTC | #12
On Tue, Apr 13, 2021 at 04:55:23PM -0300, Matheus Tavares Bernardino wrote:
> On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason
> > >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> > >> index 67f6411aff9..228ff100c61 100755
> > >> --- a/t/t4013-diff-various.sh
> > >> +++ b/t/t4013-diff-various.sh
> > >> @@ -200,10 +200,12 @@ do
> > >>         esac
> > >>         test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
> > >>         pfx=$(printf "%04d" $test_count)
> > >> -       expect="$TEST_DIRECTORY/t4013/diff.$test"
> > >> +       expect_relative="t4013/diff.$test"
> > >> +       expect="$TEST_DIRECTORY/$expect_relative"
> > >>         actual="$pfx-diff.$test"
> > >>
> > >>         test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
> > >> +               test_when_finished "rm $actual" &&
> > >
> > > Nit: before these two patches, "$actual" was only removed when the
> > > test succeeded. So, in case of failure, the failed output files would
> > > still be there for debugging. It might be interesting to keep this
> > > behavior and only remove "$actual" at the end of the test.
> >
> > Either I'm missing something or you are, that's how test_when_finished
> > works.
> >
> > It's skipped under e.g. "--immediate --debug". See b586744a864 (test:
> > skip clean-up when running under --immediate mode, 2011-06-27)
> 
> I was mostly thinking about the `artifacts` zip we get from our CI
> when a test fails. I find the final trash dir quite useful for some
> post-mortem analysis, especially to debug WIP tests that only fail
> occasionally or test failures on OSes I don't have quick access to.

On Travis CI we run tests with '--immediate' for exactly this reason;
I don't know why it's done differently on other CI systems, and,
unfortunately, 'git log --grep=immediate ci/' didn't turn up any
insights.
Junio C Hamano April 16, 2021, 6:32 p.m. UTC | #13
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Are you thinking of the case where someone wants to add N new lines to
> the "while read magic cmd" loop?

No.  I think stopping with BUG is fine.  Adding a new test is a
serious event and I'd prefer to make it stop to give it undistracted
attention by the developer, instead of letting many to be handled at
once.

Of course, we _could_ instead remember the fact that we deposted
newly proposed expectation pattern that the user must audit, still
keep going and list the new files at the end, but I do not think it
is necessary.  Your newly added BUG is perfectly fine. 

>>> In any case, I think it's best to just drop this series.
>>
>> That is a sad and wrong conclusion.  We should just realize what we
>> really wanted to fix in the first place and keep the improvement;
>> otherwise all the review time was wasted.
>
> To be clear that's not some "I can't have the change I want so I'm
> taking my toys and going home" tantrum. I don't mind making changes in
> response to reviews etc.
>
> As noted above I'm genuinely still unable to see how I'd selectively
> untangle the change I made in a way that both wouldn't either fix bugs
> in the test, or introduce new bugs while I was at it, and that I'd be
> comfortable putting my name behind.

I fail to see why the "rewrite flakyness-causing test aid" that you
set out to perform should need more than a two-liner change.  There
is no reason to touch '&& rm "$actual" actual expect' at the end.
It is the success case where the flakyness-causing "aid" does not
even kick in.

 t/t4013-diff-various.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git c/t/t4013-diff-various.sh w/t/t4013-diff-various.sh
index 6cca8b84a6..da6da13920 100755
--- c/t/t4013-diff-various.sh
+++ w/t/t4013-diff-various.sh
@@ -229,8 +229,8 @@ do
 			rm -f "$actual" actual expect
 		else
 			# this is to help developing new tests.
-			cp "$actual" "$expect"
-			false
+			mv "$actual" "$expect.new"
+			BUG "No $expect; check $expect.new and use it perhaps?"
 		fi
 	'
 done <<\EOF
n
diff mbox series

Patch

diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 67f6411aff9..228ff100c61 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -200,10 +200,12 @@  do
 	esac
 	test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g')
 	pfx=$(printf "%04d" $test_count)
-	expect="$TEST_DIRECTORY/t4013/diff.$test"
+	expect_relative="t4013/diff.$test"
+	expect="$TEST_DIRECTORY/$expect_relative"
 	actual="$pfx-diff.$test"
 
 	test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
+		test_when_finished "rm $actual" &&
 		{
 			echo "$ git $cmd"
 			case "$magic" in
@@ -216,16 +218,19 @@  do
 			    -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/"
 			echo "\$"
 		} >"$actual" &&
-		if test -f "$expect"
+
+		if ! test -f "$expect"
 		then
-			process_diffs "$actual" >actual &&
-			process_diffs "$expect" >expect &&
-			test_cmp expect actual
-		else
-			# this is to help developing new tests.
-			cp "$actual" "$expect"
-			false
-		fi
+			expect_new="$expect.new" &&
+			cp "$actual" "$expect_new" &&
+			BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?"
+		fi &&
+
+		test_when_finished "rm actual" &&
+		process_diffs "$actual" >actual &&
+		test_when_finished "rm expect" &&
+		process_diffs "$expect" >expect &&
+		test_cmp expect actual
 	'
 done <<\EOF
 diff-tree initial