Message ID | patch-2.2-c2cb52b6605-20210413T122645Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff tests: un-flaky and post-gettext-poison cleanup | expand |
On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > If a new test is added to this "while read magic cmd" test facility > added in 3c2f75b590c (t4013: add tests for diff/log family output > options., 2006-06-26) but no test file is added it'll fail the first > time, but then succeed on subsequent runs as a new file has been added > in t4013. > > Let's accomplish the same aim in way that doesn't cause subsequent s/in way/in a way/ ? > test runs to succeed. If we can't find the file we'll BUG out, and > suggest to the developer that they copy our "expect.new" file over, > unlike the previous "expect" file this won't be picked up on > subsequent runs. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > t/t4013-diff-various.sh | 25 +++++++++++++++---------- > 1 file changed, 15 insertions(+), 10 deletions(-) > > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > index 67f6411aff9..228ff100c61 100755 > --- a/t/t4013-diff-various.sh > +++ b/t/t4013-diff-various.sh > @@ -200,10 +200,12 @@ do > esac > test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') > pfx=$(printf "%04d" $test_count) > - expect="$TEST_DIRECTORY/t4013/diff.$test" > + expect_relative="t4013/diff.$test" > + expect="$TEST_DIRECTORY/$expect_relative" > actual="$pfx-diff.$test" > > test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' > + test_when_finished "rm $actual" && Nit: before these two patches, "$actual" was only removed when the test succeeded. So, in case of failure, the failed output files would still be there for debugging. It might be interesting to keep this behavior and only remove "$actual" at the end of the test. > { > echo "$ git $cmd" > case "$magic" in > @@ -216,16 +218,19 @@ do > -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/" > echo "\$" > } >"$actual" && > - if test -f "$expect" > + > + if ! test -f "$expect" > then > - process_diffs "$actual" >actual && > - process_diffs "$expect" >expect && > - test_cmp expect actual > - else > - # this is to help developing new tests. > - cp "$actual" "$expect" > - false > - fi > + expect_new="$expect.new" && > + cp "$actual" "$expect_new" && > + BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?" > + fi && > + > + test_when_finished "rm actual" && > + process_diffs "$actual" >actual && > + test_when_finished "rm expect" && > + process_diffs "$expect" >expect && > + test_cmp expect actual > ' > done <<\EOF > diff-tree initial The rest LGTM, thanks.
On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: > On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> If a new test is added to this "while read magic cmd" test facility >> added in 3c2f75b590c (t4013: add tests for diff/log family output >> options., 2006-06-26) but no test file is added it'll fail the first >> time, but then succeed on subsequent runs as a new file has been added >> in t4013. >> >> Let's accomplish the same aim in way that doesn't cause subsequent > > s/in way/in a way/ ? *nod* >> test runs to succeed. If we can't find the file we'll BUG out, and >> suggest to the developer that they copy our "expect.new" file over, >> unlike the previous "expect" file this won't be picked up on >> subsequent runs. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> t/t4013-diff-various.sh | 25 +++++++++++++++---------- >> 1 file changed, 15 insertions(+), 10 deletions(-) >> >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh >> index 67f6411aff9..228ff100c61 100755 >> --- a/t/t4013-diff-various.sh >> +++ b/t/t4013-diff-various.sh >> @@ -200,10 +200,12 @@ do >> esac >> test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') >> pfx=$(printf "%04d" $test_count) >> - expect="$TEST_DIRECTORY/t4013/diff.$test" >> + expect_relative="t4013/diff.$test" >> + expect="$TEST_DIRECTORY/$expect_relative" >> actual="$pfx-diff.$test" >> >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' >> + test_when_finished "rm $actual" && > > Nit: before these two patches, "$actual" was only removed when the > test succeeded. So, in case of failure, the failed output files would > still be there for debugging. It might be interesting to keep this > behavior and only remove "$actual" at the end of the test. Either I'm missing something or you are, that's how test_when_finished works. It's skipped under e.g. "--immediate --debug". See b586744a864 (test: skip clean-up when running under --immediate mode, 2011-06-27) Maybe there's some edge case where we'd like to keep the files that it's not covering, but then we should patch it to do the right thing, not use manual "rm" at the end instead. >> { >> echo "$ git $cmd" >> case "$magic" in >> @@ -216,16 +218,19 @@ do >> -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/" >> echo "\$" >> } >"$actual" && >> - if test -f "$expect" >> + >> + if ! test -f "$expect" >> then >> - process_diffs "$actual" >actual && >> - process_diffs "$expect" >expect && >> - test_cmp expect actual >> - else >> - # this is to help developing new tests. >> - cp "$actual" "$expect" >> - false >> - fi >> + expect_new="$expect.new" && >> + cp "$actual" "$expect_new" && >> + BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?" >> + fi && >> + >> + test_when_finished "rm actual" && >> + process_diffs "$actual" >actual && >> + test_when_finished "rm expect" && >> + process_diffs "$expect" >expect && >> + test_cmp expect actual >> ' >> done <<\EOF >> diff-tree initial > > The rest LGTM, thanks. Thanks a lot for the review!
On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: > > > On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> t/t4013-diff-various.sh | 25 +++++++++++++++---------- > >> 1 file changed, 15 insertions(+), 10 deletions(-) > >> > >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > >> index 67f6411aff9..228ff100c61 100755 > >> --- a/t/t4013-diff-various.sh > >> +++ b/t/t4013-diff-various.sh > >> @@ -200,10 +200,12 @@ do > >> esac > >> test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') > >> pfx=$(printf "%04d" $test_count) > >> - expect="$TEST_DIRECTORY/t4013/diff.$test" > >> + expect_relative="t4013/diff.$test" > >> + expect="$TEST_DIRECTORY/$expect_relative" > >> actual="$pfx-diff.$test" > >> > >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' > >> + test_when_finished "rm $actual" && > > > > Nit: before these two patches, "$actual" was only removed when the > > test succeeded. So, in case of failure, the failed output files would > > still be there for debugging. It might be interesting to keep this > > behavior and only remove "$actual" at the end of the test. > > Either I'm missing something or you are, that's how test_when_finished > works. > > It's skipped under e.g. "--immediate --debug". See b586744a864 (test: > skip clean-up when running under --immediate mode, 2011-06-27) I was mostly thinking about the `artifacts` zip we get from our CI when a test fails. I find the final trash dir quite useful for some post-mortem analysis, especially to debug WIP tests that only fail occasionally or test failures on OSes I don't have quick access to.
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> > Nit: before these two patches, "$actual" was only removed when the >> > test succeeded. So, in case of failure, the failed output files would >> > still be there for debugging. It might be interesting to keep this >> > behavior and only remove "$actual" at the end of the test. >> >> Either I'm missing something or you are, that's how test_when_finished >> works. >> >> It's skipped under e.g. "--immediate --debug". See b586744a864 (test: >> skip clean-up when running under --immediate mode, 2011-06-27) > > I was mostly thinking about the `artifacts` zip we get from our CI > when a test fails. I find the final trash dir quite useful for some > post-mortem analysis, especially to debug WIP tests that only fail > occasionally or test failures on OSes I don't have quick access to. Yeah, without "--immediate" we do run the clean-up tasks registered with test_when_finished, so it may be a problem.
On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: > On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> >> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: >> >> > On Tue, Apr 13, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason >> > <avarab@gmail.com> wrote: >> >> t/t4013-diff-various.sh | 25 +++++++++++++++---------- >> >> 1 file changed, 15 insertions(+), 10 deletions(-) >> >> >> >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh >> >> index 67f6411aff9..228ff100c61 100755 >> >> --- a/t/t4013-diff-various.sh >> >> +++ b/t/t4013-diff-various.sh >> >> @@ -200,10 +200,12 @@ do >> >> esac >> >> test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') >> >> pfx=$(printf "%04d" $test_count) >> >> - expect="$TEST_DIRECTORY/t4013/diff.$test" >> >> + expect_relative="t4013/diff.$test" >> >> + expect="$TEST_DIRECTORY/$expect_relative" >> >> actual="$pfx-diff.$test" >> >> >> >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' >> >> + test_when_finished "rm $actual" && >> > >> > Nit: before these two patches, "$actual" was only removed when the >> > test succeeded. So, in case of failure, the failed output files would >> > still be there for debugging. It might be interesting to keep this >> > behavior and only remove "$actual" at the end of the test. >> >> Either I'm missing something or you are, that's how test_when_finished >> works. >> >> It's skipped under e.g. "--immediate --debug". See b586744a864 (test: >> skip clean-up when running under --immediate mode, 2011-06-27) > > I was mostly thinking about the `artifacts` zip we get from our CI > when a test fails. I find the final trash dir quite useful for some > post-mortem analysis, especially to debug WIP tests that only fail > occasionally or test failures on OSes I don't have quick access to. Ah, yes that's a problem we should solve, but I think we should not put off migration to test_when_finished because of that. The whole reason we use it is to clean up the work area for the next test. Thus if we do: git something >expected && test_cmp expected actual && rm expected actual And "git something" segfaults it's only dumb luck that subsequent tests don't fail in non-obvious ways due to the files being left behind. If we could rely on this in general we could make everything under test_when_finished a noop, but I think you'll find that'll fail a lot of things in the test suite if we do that. But the problem you cite is legitimate, but we should solve it with something like: 1. If we fail tests 58 and 67 out of 100 we should/could run these again at the end under some more exhaustive debugging mode where we'd save away all the intermediate files. 2. Or, and test_when_finished helps here, we could set $PATH when it runs to a directory with a custom "rm", which would save away the temporary files, expecting that we'll tar them up if any of the tests whose files we saved failed (we'd delete the other ones). 3. Similarly, wrap "rm" in some "cat and rm" for common cases like actual/expected files, and improve the verbose output to some "verbose if fails".
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: > ... >>> >> actual="$pfx-diff.$test" >>> >> >>> >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' >>> >> + test_when_finished "rm $actual" && >>> > >>> > Nit: before these two patches, "$actual" was only removed when the >>> > test succeeded. So, in case of failure, the failed output files would >>> > still be there for debugging. It might be interesting to keep this >>> > behavior and only remove "$actual" at the end of the test. > ... > Ah, yes that's a problem we should solve, but I think we should not put > off migration to test_when_finished because of that. > > The whole reason we use it is to clean up the work area for the next > test. > > Thus if we do: > > git something >expected && > test_cmp expected actual && > rm expected actual Isn't it a poor example to use to argue for your particular change, where $actual in the original is designed to be unique among tests, in order to ensure that $actual files left after test pieces fail would not interfere with the tests that come later? IOW, there is not a reason to remove $actual until the end of the test sequence, is there?
On Wed, Apr 14 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Tue, Apr 13 2021, Matheus Tavares Bernardino wrote: >> ... >>>> >> actual="$pfx-diff.$test" >>>> >> >>>> >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' >>>> >> + test_when_finished "rm $actual" && >>>> > >>>> > Nit: before these two patches, "$actual" was only removed when the >>>> > test succeeded. So, in case of failure, the failed output files would >>>> > still be there for debugging. It might be interesting to keep this >>>> > behavior and only remove "$actual" at the end of the test. >> ... >> Ah, yes that's a problem we should solve, but I think we should not put >> off migration to test_when_finished because of that. >> >> The whole reason we use it is to clean up the work area for the next >> test. >> >> Thus if we do: >> >> git something >expected && >> test_cmp expected actual && >> rm expected actual > > Isn't it a poor example to use to argue for your particular change, > where $actual in the original is designed to be unique among tests, > in order to ensure that $actual files left after test pieces fail > would not interfere with the tests that come later? IOW, there is > not a reason to remove $actual until the end of the test sequence, > is there? Not really, because you needed to read the rest of the test file to come to that conclusion. The point of using a helper that guarantees cleanup such as test_when_finished or test_config over manual "git config" or "git rm" isn't that we can prove that we need it because a later test needs the cleanup, but that anyone can add new tests or functionality without having to worry about cleaning up the existing trash directory. So yes, it's not needed here, but that's only because we know the rest of the tests don't have e.g. a test that does a: for file in t4013/* Anyway, that's only half of the example I cited for why we should use test_when_finished in general, the other half was noting that because we've semantically annotated cleanup tasks we can smartly handle those for debugging, e.g. stash the "removed" files away, and if the test fails present them to the user.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> Thus if we do: >>> >>> git something >expected && >>> test_cmp expected actual && >>> rm expected actual >> >> Isn't it a poor example to use to argue for your particular change, >> where $actual in the original is designed to be unique among tests, >> in order to ensure that $actual files left after test pieces fail >> would not interfere with the tests that come later? IOW, there is >> not a reason to remove $actual until the end of the test sequence, >> is there? > > Not really, because you needed to read the rest of the test file to come > to that conclusion. > > The point of using a helper that guarantees cleanup such as > test_when_finished or test_config over manual "git config" or "git rm" > isn't that we can prove that we need it because a later test needs the > cleanup, but that anyone can add new tests or functionality without > having to worry about cleaning up the existing trash directory. > > So yes, it's not needed here, but that's only because we know the rest > of the tests don't have e.g. a test that does a: In this particular case, $actual files are designed to be left behind for failed test pieces, so that the tester can come back and inspect them. I probably should have said it a bit more strongly than "there is not a reason to remove". You SHOULD NOT remove and that is why we had "check and then remove only upon success" there, instead of test_when_finished. We want them left for and only for failing test pieces. Please do not advocate for and encourage newbies who would be reading the discussion from sidelines to use test_when_finished out of dogmatic principle without thinking. Even though there are valid cases where test_when_finished is the perfect fit, in this particular case, use of it is a clear regression. Thanks.
On Wed, Apr 14 2021, Junio C Hamano wrote: ["tl;dr" below]: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>>> Thus if we do: >>>> >>>> git something >expected && >>>> test_cmp expected actual && >>>> rm expected actual >>> >>> Isn't it a poor example to use to argue for your particular change, >>> where $actual in the original is designed to be unique among tests, >>> in order to ensure that $actual files left after test pieces fail >>> would not interfere with the tests that come later? IOW, there is >>> not a reason to remove $actual until the end of the test sequence, >>> is there? >> >> Not really, because you needed to read the rest of the test file to come >> to that conclusion. >> >> The point of using a helper that guarantees cleanup such as >> test_when_finished or test_config over manual "git config" or "git rm" >> isn't that we can prove that we need it because a later test needs the >> cleanup, but that anyone can add new tests or functionality without >> having to worry about cleaning up the existing trash directory. >> >> So yes, it's not needed here, but that's only because we know the rest >> of the tests don't have e.g. a test that does a: > > In this particular case, $actual files are designed to be left > behind for failed test pieces, so that the tester can come back and > inspect them. I probably should have said it a bit more strongly > than "there is not a reason to remove". You SHOULD NOT remove and > that is why we had "check and then remove only upon success" there, > instead of test_when_finished. We want them left for and only for > failing test pieces. Yes, it's clear that it's designed to do that. I'm not disagreeing on the intent of your commit in 2006 to set it up this way. What I am saying is that it's incompatible to have: 1. Failing tests 2. Not removing scratch files that would otherwise be removed And: 3. Knowing that the rest of the tests pass in the case of #1 without reading them all. Hence the suggestion that we should use test_when_finished without exception for such patterns. Because consistently using the helper would allow us to smartly get #1 and #2 without #3, i.e. some "copy them for later analysis" as suggested upthread in: http://lore.kernel.org/git/874kg92xn0.fsf@evledraar.gmail.com > Please do not advocate for and encourage newbies who would be > reading the discussion from sidelines to use test_when_finished out > of dogmatic principle without thinking. Is Matheus the newbie here? I think he's contributed enough to form his own opinion. I think it's clear to anyone else reading from the sidelines that we're having some informed disagreement about a finer point of patterns in the test suite. I doubt either one of us is likely to have much of an impact on newbies reading from the sidelines. In any case, I don't see how you're able to read this thread and come to the conclusion that I'm proposing the use of test_when_finished out of "[some] dogmatic principle". I'm not. I'm proposing to use it because I think it makes sense, and I think the reasons you've noted for avoiding it in this case have more downsides than upsites, as noted in the #1-3 examples above. > Even though there are valid cases where test_when_finished is the > perfect fit, in this particular case, use of it is a clear regression. Well, I disagree with you. I think even if the patch I was proposing was a mere conversion to test_when_finished as would be typical for most newly written tests, i.e. a diff like this: diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 6cca8b84a6b..d0031aa0f7b 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -204,6 +204,7 @@ do actual="$pfx-diff.$test" test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' + test_when_finished "rm \"$actual\"" && { echo "$ git $cmd" case "$magic" in @@ -216,22 +217,11 @@ do -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/" echo "\$" } >"$actual" && - if test -f "$expect" - then - process_diffs "$actual" >actual && - process_diffs "$expect" >expect && - case $cmd in - *format-patch* | *-stat*) - test_cmp expect actual;; - *) - test_cmp expect actual;; - esac && - rm -f "$actual" actual expect - else - # this is to help developing new tests. - cp "$actual" "$expect" - false - fi + test_when_finished "rm actual" && + process_diffs "$actual" >actual && + test_when_finished "rm expect" && + process_diffs "$expect" >expect && + test_cmp expect actual ' done <<\EOF diff-tree initial It's a clear improvement, because, on current master, emulating the case when you add a new test-case: $ rm t4013/diff.log_-GF_-p_--pickaxe-all_master Now run it once: $ ./t4013-diff-various.sh [...] # failed 1 among 199 test(s) And then to debug it: $ ./t4013-diff-various.sh -vixd [...] # passed all 199 test(s) So because we copied the file around the only get the failure the first time around. You might rightly say that we could have narrowly just fixed that particular bug and kept the old "copy and return false", so something like: diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 6cca8b84a6b..6f0f3c7f53c 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -229,7 +229,7 @@ do rm -f "$actual" actual expect else # this is to help developing new tests. - cp "$actual" "$expect" + cp "$actual" "$expect.MAYBE-USE-THIS" false fi ' But as noted above that gets us into needing to worry about cascading failure. I.e. any new test added later in the file needs to think about "what if the t4013 directory is in this state?". We also document what we'll do and how we'll do it in t/README in this case: --immediate:: This causes the test to immediately exit upon the first failed test. Cleanup commands requested with test_when_finished are not executed if the test failed, in order to keep the state for inspection by the tester to diagnose the bug. So it's much more obvious and consistent to have a failure that ends on the test_cmp and being able to see the test_when_finished for the intermediate files in the trace output. The hypothetical newbie might have debugged such a pattern already, but not having it end in a special-case used only for this test: [...] + test -f /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master + cp 0096-diff.log_-GF_-p_--pickaxe-all_master /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master.MAYBE-USE-THIS + false In any case, neither one of those is the patch I'm suggesting upthread. I am suggesting not just to use test_when_finished, but that we should use BUG here. Since not having the file itself is a bug in git's test suite, not a mere failure of the test case. The test will never pass if the file doesn't exist. So even if I agreed with you that the more narrow migration to test_when_finished would be a regression, I don't see how you think *this* patch could be a regression. You seem to just be focusing on the test_when_finished part of it, but that's not all it's doing. Since we also use BUG the output is now: $ ./t4013-diff-various.sh [...] error: bug in the test script: Have no "t4013/diff.log_-GF_-p_--pickaxe-all_master", new test? [...] I.e. whether we used "rm" or "cp" or "test_when_finished 'rm [...]'" earlier has become irrelevant to achieve the states aims in 3c2f75b590c (t4013: add tests for diff/log family output options., 2006-06-26). It doesn't matter how we're removing/copying etc. the file anymore, that's no longer how we're helping the user debug the test. tl;dr: In any case, I think it's best to just drop this series. I wrote the above more as a synopsis of what I belive we should be doing in general, I do have some patches queued up to address the general debug-ability of our test output (e.g. for CI). For that end-goal having e.g. test_when_finished (mostly) consistently used would help a lot, but not using it in any one test would be fine. This is just one such case, and something I thought wouldn't be a controversial patch. I think at this point me trying to gleaming entrails of what you want & you responding back is going to take both of us much more time than replacing this series with a patch you'd be happy with, and this wasn't something I cared about more than a one-off.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > What I am saying is that it's incompatible to have: > > 1. Failing tests > 2. Not removing scratch files that would otherwise be removed > > And: > > 3. Knowing that the rest of the tests pass in the case of #1 without > reading them all. > > Hence the suggestion that we should use test_when_finished without > exception for such patterns. I disagree with the above; t4013's "read magic cmd" part is designed to be independent from each other; I do not think you need to read all of the parts enclosed in << ... EOF to understand that. In short, * "test_when_finished 'rm ...'" is a good tool to ensure something gets removed no matter what else happens in the same test. Since it does not run the clean-up under "-i", it can even be used on files that would be useful in diagnosing failures. But under "-d", it does clean-up to avoid affecting the following test. * $actual was made unique so that even under "-d", a failing test would not negatively affect the subsequent ones. Removing it for failure cases is actively wrong, so use of test_when_finished, which may be an expedient way to remove artifact that would negatively affect later test pieces, does not apply --- existing code is doing better than test_when_finished can offer, and replacing it with test_when_finished is a regression. * And the most important part: the unnecessary removal of $actual was not even part of the "flakyness-causing" bug you started this topic to fix anyway. We should just remove the regression caused by unnecessary use of test_when_finished and focus on the "don't place the actual output from a brand new test under the file used for the expectation the next time---instead place it under temporary file and call for attention" part, which was the real improvement. >> Please do not advocate for and encourage newbies who would be >> reading the discussion from sidelines to use test_when_finished out >> of dogmatic principle without thinking. > > Is Matheus the newbie here? I think he's contributed enough to form his > own opinion. No, those "reading the discussion from sidelines" are not on To: or Cc:, but are eager to learn by reading what is available to git@ subscribers (including the lore archive). I do not want those of us whose name appear often in the list archive to be sending a wrong signal to them. > In any case, I think it's best to just drop this series. That is a sad and wrong conclusion. We should just realize what we really wanted to fix in the first place and keep the improvement; otherwise all the review time was wasted. And next time, I'd suggest you not to spend too many bytes on "while at it". Clear and obvious clean-up while the tree and the project is otherwise quiescent is very much no-brainer welcome, but otherwise... Thanks.
On Thu, Apr 15 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> What I am saying is that it's incompatible to have: >> >> 1. Failing tests >> 2. Not removing scratch files that would otherwise be removed >> >> And: >> >> 3. Knowing that the rest of the tests pass in the case of #1 without >> reading them all. >> >> Hence the suggestion that we should use test_when_finished without >> exception for such patterns. > > I disagree with the above; t4013's "read magic cmd" part is designed > to be independent from each other; I do not think you need to read > all of the parts enclosed in << ... EOF to understand that. > > In short, > > * "test_when_finished 'rm ...'" is a good tool to ensure something > gets removed no matter what else happens in the same test. Since > it does not run the clean-up under "-i", it can even be used on > files that would be useful in diagnosing failures. But under > "-d", it does clean-up to avoid affecting the following test. > > * $actual was made unique so that even under "-d", a failing test > would not negatively affect the subsequent ones. [...] I don't mind disagreeing with you, but I do feel like we're just talking past each other, either that or we've got a different understanding of "negatively affect the subsequent ones". I'm saying that yes I agree that *right now* it doesn't negatively affect any of the subsequent ones, but that you can only know that if you need all the tests in the file past the one you're modifying. I.e. imagine if we added, obviously contrived, but a stand-in for "we expect consistent state": diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 6cca8b84a6b..a326968e8f3 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -526,4 +526,9 @@ test_expect_success 'diff -I<regex>: detect malformed regex' ' test_i18ngrep "invalid regex given to -I: " error ' +test_expect_success 'stuff' ' + find * -type f >actual && + test_line_count = 9 actual +' + test_done Now run it, and you'll pass them all, now break one of the earlier tests with: >t4013/diff.log_-GF_-p_--pickaxe-all_master You'll fail two tests (the expected one, and the new one), instead of just one, which is purely an artifact of the earlier test's cleanup / state being different as a result of whether it passes or not. With my patch you'll fail just the one earlier test, because we do the same cleanup no matter whether we failed midway through it or not. So yes, while this small series amounts to re-arranging some deck chairs on the Titanic I do think not being able to rely on consistent state is a big anti-pattern in the test suite in general. It's a big part of the reason for why you can't reliably use --run=N or GIT_SKIP_TESTS (especially in cases of partial failures) to selectively skip things without what amounts to bespoke analysis of each test you'd like to run a subset of. Now, of course test_when_finished won't get us there by itself, we'd also need consistently use some "test_expect_setup" to clearly delineate "setup" and other "this must run" state changes from "just consumes existing state and resets". But consistently writing new/modified tests as "just consumes existing state and resets" will slowly get us there. > [...] Removing it for > failure cases is actively wrong, so use of test_when_finished, > which may be an expedient way to remove artifact that would > negatively affect later test pieces, does not apply --- existing > code is doing better than test_when_finished can offer, and > replacing it with test_when_finished is a regression. Aside from whether we agree or disagree on whether test_when_finished should be consistently used to clean up state I genuinely still don't see how the sum total of replacing the current "rm && cp" with "test_when_finished && BUG" leaves us worse off. Are you thinking of the case where someone wants to add N new lines to the "while read magic cmd" loop? That's the only case I can think of that would arguably be worse, but I think it's far outweighed by the more obvious failure mode, and a "git blame" on the lines it consumes shows that additions are very occasional (and not in large batches). Other than that, under e.g. -vixd having the test suite stop on the specific line that failed (the test_cmp, not a "false" after some "cp") is how most other tests works, and I think makes things much more readable and understandable at a glance. > * And the most important part: the unnecessary removal of $actual > was not even part of the "flakyness-causing" bug you started this > topic to fix anyway. We should just remove the regression caused > by unnecessary use of test_when_finished and focus on the "don't > place the actual output from a brand new test under the file used > for the expectation the next time---instead place it under > temporary file and call for attention" part, which was the real > improvement. I see your point and agree in theory that the "we write into the t/t4013/ directory" and "we inconsistently clear up the trash directory on OK/NOK" are in principle different issues. But in this case we always removed or copied "$actual" away from its original name before, we do the same now (just more reliably, and in a way that doesn't cause flakyness in the "cp" case). So I don't really see how to split up those two arguably different changes in a way that would make sense. The only case where "$actual" was left before was (I think) a clear of leaking state unnecessarily outside of --immediate. >>> Please do not advocate for and encourage newbies who would be >>> reading the discussion from sidelines to use test_when_finished out >>> of dogmatic principle without thinking. >> >> Is Matheus the newbie here? I think he's contributed enough to form his >> own opinion. > > No, those "reading the discussion from sidelines" are not on To: or > Cc:, but are eager to learn by reading what is available to git@ > subscribers (including the lore archive). I do not want those of us > whose name appear often in the list archive to be sending a wrong > signal to them. I'll endeavor to make it clear that my opinion is just my opinion etc. in the future, I thought it was clear from the context :) >> In any case, I think it's best to just drop this series. > > That is a sad and wrong conclusion. We should just realize what we > really wanted to fix in the first place and keep the improvement; > otherwise all the review time was wasted. To be clear that's not some "I can't have the change I want so I'm taking my toys and going home" tantrum. I don't mind making changes in response to reviews etc. As noted above I'm genuinely still unable to see how I'd selectively untangle the change I made in a way that both wouldn't either fix bugs in the test, or introduce new bugs while I was at it, and that I'd be comfortable putting my name behind. Hence the suggestion that since we're already we past the point of just having come up with two independent approaches to this in terms of time investment, that doing so would be a better thing overall. I do think (and would be willing to work on) that t/README should have some summary of best practices for new/updated tests that we should be striving for. Looking at our "test_when_finished" there's at least tow independent bugs[1] in it, so ther's some room for improvement. 1. It creates a ref with echo, but removes it with update-ref -d, which e.g. couldn't delete a corrupt ref (if rev-parse is buggy); and more importantly if rev-parse dies midway through we're left with a corrupt repo, the "test_when_finished ... update-ref -d" should come *first*.
On Tue, Apr 13, 2021 at 04:55:23PM -0300, Matheus Tavares Bernardino wrote: > On Tue, Apr 13, 2021 at 4:01 PM Ævar Arnfjörð Bjarmason > > >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > > >> index 67f6411aff9..228ff100c61 100755 > > >> --- a/t/t4013-diff-various.sh > > >> +++ b/t/t4013-diff-various.sh > > >> @@ -200,10 +200,12 @@ do > > >> esac > > >> test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') > > >> pfx=$(printf "%04d" $test_count) > > >> - expect="$TEST_DIRECTORY/t4013/diff.$test" > > >> + expect_relative="t4013/diff.$test" > > >> + expect="$TEST_DIRECTORY/$expect_relative" > > >> actual="$pfx-diff.$test" > > >> > > >> test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' > > >> + test_when_finished "rm $actual" && > > > > > > Nit: before these two patches, "$actual" was only removed when the > > > test succeeded. So, in case of failure, the failed output files would > > > still be there for debugging. It might be interesting to keep this > > > behavior and only remove "$actual" at the end of the test. > > > > Either I'm missing something or you are, that's how test_when_finished > > works. > > > > It's skipped under e.g. "--immediate --debug". See b586744a864 (test: > > skip clean-up when running under --immediate mode, 2011-06-27) > > I was mostly thinking about the `artifacts` zip we get from our CI > when a test fails. I find the final trash dir quite useful for some > post-mortem analysis, especially to debug WIP tests that only fail > occasionally or test failures on OSes I don't have quick access to. On Travis CI we run tests with '--immediate' for exactly this reason; I don't know why it's done differently on other CI systems, and, unfortunately, 'git log --grep=immediate ci/' didn't turn up any insights.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Are you thinking of the case where someone wants to add N new lines to > the "while read magic cmd" loop? No. I think stopping with BUG is fine. Adding a new test is a serious event and I'd prefer to make it stop to give it undistracted attention by the developer, instead of letting many to be handled at once. Of course, we _could_ instead remember the fact that we deposted newly proposed expectation pattern that the user must audit, still keep going and list the new files at the end, but I do not think it is necessary. Your newly added BUG is perfectly fine. >>> In any case, I think it's best to just drop this series. >> >> That is a sad and wrong conclusion. We should just realize what we >> really wanted to fix in the first place and keep the improvement; >> otherwise all the review time was wasted. > > To be clear that's not some "I can't have the change I want so I'm > taking my toys and going home" tantrum. I don't mind making changes in > response to reviews etc. > > As noted above I'm genuinely still unable to see how I'd selectively > untangle the change I made in a way that both wouldn't either fix bugs > in the test, or introduce new bugs while I was at it, and that I'd be > comfortable putting my name behind. I fail to see why the "rewrite flakyness-causing test aid" that you set out to perform should need more than a two-liner change. There is no reason to touch '&& rm "$actual" actual expect' at the end. It is the success case where the flakyness-causing "aid" does not even kick in. t/t4013-diff-various.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git c/t/t4013-diff-various.sh w/t/t4013-diff-various.sh index 6cca8b84a6..da6da13920 100755 --- c/t/t4013-diff-various.sh +++ w/t/t4013-diff-various.sh @@ -229,8 +229,8 @@ do rm -f "$actual" actual expect else # this is to help developing new tests. - cp "$actual" "$expect" - false + mv "$actual" "$expect.new" + BUG "No $expect; check $expect.new and use it perhaps?" fi ' done <<\EOF n
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 67f6411aff9..228ff100c61 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -200,10 +200,12 @@ do esac test=$(echo "$label" | sed -e 's|[/ ][/ ]*|_|g') pfx=$(printf "%04d" $test_count) - expect="$TEST_DIRECTORY/t4013/diff.$test" + expect_relative="t4013/diff.$test" + expect="$TEST_DIRECTORY/$expect_relative" actual="$pfx-diff.$test" test_expect_$status "git $cmd # magic is ${magic:-(not used)}" ' + test_when_finished "rm $actual" && { echo "$ git $cmd" case "$magic" in @@ -216,16 +218,19 @@ do -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/" echo "\$" } >"$actual" && - if test -f "$expect" + + if ! test -f "$expect" then - process_diffs "$actual" >actual && - process_diffs "$expect" >expect && - test_cmp expect actual - else - # this is to help developing new tests. - cp "$actual" "$expect" - false - fi + expect_new="$expect.new" && + cp "$actual" "$expect_new" && + BUG "Have no \"$expect_relative\", new test? The output is in \"$expect_new\", maybe use that?" + fi && + + test_when_finished "rm actual" && + process_diffs "$actual" >actual && + test_when_finished "rm expect" && + process_diffs "$expect" >expect && + test_cmp expect actual ' done <<\EOF diff-tree initial
If a new test is added to this "while read magic cmd" test facility added in 3c2f75b590c (t4013: add tests for diff/log family output options., 2006-06-26) but no test file is added it'll fail the first time, but then succeed on subsequent runs as a new file has been added in t4013. Let's accomplish the same aim in way that doesn't cause subsequent test runs to succeed. If we can't find the file we'll BUG out, and suggest to the developer that they copy our "expect.new" file over, unlike the previous "expect" file this won't be picked up on subsequent runs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/t4013-diff-various.sh | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-)