Message ID | 87tuhmk19c.fsf@evledraar.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | test-lib.sh musings: test_expect_failure considered harmful | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Mon, Oct 11 2021, Junio C Hamano wrote: > > [Removed "In-reply-to: <xmqq5yu3b80j.fsf@gitster.g>" with the Subject > change] Please do not do the former, although it is welcome to change Subject. > Presumably with test_expect_failure. > > I'll change it, in this case we'd end up with a test_expect_success at > the end, so it doesn't matter much & I don't care. I do agree with you that compared to expect_success, which requires _all_ steps to succeed, so an failure in any of its steps is immediately noticeable, it is harder to write and keep expect_failure useful, because it is not like we are happy to see any failure in any step. We do not expect a failure in many preparation and conclusion steps in the &&-chain in expect_failure block, and we consider it is an error if these steps fail. We only want to mark only a single step to exhibit an expected but undesirable behaviour. But even with the shortcomings of expect_failure, it still is much better than claiming that we expect a bogus outcome. Improving the shortcomings of expect_failure would be a much better use of our time than advocating an abuse of expect_sucess, I would think.
On Tue, Oct 12 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Mon, Oct 11 2021, Junio C Hamano wrote: > [...] >> Presumably with test_expect_failure. >> >> I'll change it, in this case we'd end up with a test_expect_success at >> the end, so it doesn't matter much & I don't care. > > I do agree with you that compared to expect_success, which requires > _all_ steps to succeed, so an failure in any of its steps is > immediately noticeable, it is harder to write and keep > expect_failure useful, because it is not like we are happy to see > any failure in any step. We do not expect a failure in many > preparation and conclusion steps in the &&-chain in expect_failure > block, and we consider it is an error if these steps fail. We only > want to mark only a single step to exhibit an expected but undesirable > behaviour. > > But even with the shortcomings of expect_failure, it still is much > better than claiming that we expect a bogus outcome. > > Improving the shortcomings of expect_failure would be a much better > use of our time than advocating an abuse of expect_sucess, I would > think. I'd like to improve it, but I'll have to get any patch in this are past you :) My reading of your opinion from past exchanges is that you find it objectionable to say "this is a success" when it's not the /desired/ behavior, whereas I think it's valuable to just test for and document the exact existing behavior, even if it's not desirable. So you don't really need a function different from test_expect_success, just a comment saying "this should change", or add a ("non-hash so it's not TAP syntax") "TODO" to the description of the test. But if you agree that we shouldn't conflate failures in the different steps I think we're getting somewhere, so to begin with what do you think about the hack in the v2 of my series? https://lore.kernel.org/git/cover-v2-0.2-00000000000-20211012T142950Z-avarab@gmail.com/ If we were to prompote those semantics to something that test_expect_failure would use it would be the below, which I think is the only sensible way to use it. But that would mean changing all existing test_expect_failure uses in the test suite, so it would need either a pretty large patch, or some incremental steps to get there: But it will mean we can't use it for any test that's actually flaky, so we'll need a test_expect_flaky, or have some test-specific workarounds in those areas. diff --git a/t/t7815-grep-binary.sh b/t/t7815-grep-binary.sh index 90ebb64f46e..9a95c9e7d69 100755 --- a/t/t7815-grep-binary.sh +++ b/t/t7815-grep-binary.sh @@ -64,7 +64,7 @@ test_expect_success 'git grep ile a' ' ' test_expect_failure 'git grep .fi a' ' - git grep .fi a + test_must_fail git grep .fi a ' test_expect_success 'grep respects binary diff attribute' ' diff --git a/t/test-lib.sh b/t/test-lib.sh index 8361b5c1c57..6d9291b7ead 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -728,8 +728,8 @@ test_known_broken_ok_ () { then write_junit_xml_testcase "$* (breakage fixed)" fi - test_fixed=$(($test_fixed+1)) - say_color error "ok $test_count - $@ # TODO known breakage vanished" + test_broken=$(($test_broken+1)) + say_color warn "not ok $test_count - $@ # TODO known breakage" } test_known_broken_failure_ () { @@ -737,8 +737,8 @@ test_known_broken_failure_ () { then write_junit_xml_testcase "$* (known breakage)" fi - test_broken=$(($test_broken+1)) - say_color warn "not ok $test_count - $@ # TODO known breakage" + test_fixed=$(($test_fixed+1)) + say_color error "not ok $test_count - $@ # TODO a 'known breakage' changed behavior!" } test_debug () {
On 10/12/2021 12:45 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Mon, Oct 11 2021, Junio C Hamano wrote: >> >> [Removed "In-reply-to: <xmqq5yu3b80j.fsf@gitster.g>" with the Subject >> change] > > Please do not do the former, although it is welcome to change Subject. > >> Presumably with test_expect_failure. >> >> I'll change it, in this case we'd end up with a test_expect_success at >> the end, so it doesn't matter much & I don't care. > > I do agree with you that compared to expect_success, which requires > _all_ steps to succeed, so an failure in any of its steps is > immediately noticeable, it is harder to write and keep > expect_failure useful, because it is not like we are happy to see > any failure in any step. We do not expect a failure in many > preparation and conclusion steps in the &&-chain in expect_failure > block, and we consider it is an error if these steps fail. We only > want to mark only a single step to exhibit an expected but undesirable > behaviour. > > But even with the shortcomings of expect_failure, it still is much > better than claiming that we expect a bogus outcome. > > Improving the shortcomings of expect_failure would be a much better > use of our time than advocating an abuse of expect_sucess, I would > think. I agree that test_expect_failure has these drawbacks. I've recently been using _expect_success to document "bad" behavior so we can verify that behavior changes when that behavior is fixed. But it does have the drawback of looking like we claim the result is by design. One possible way to correct this is to create a "test_expected_failure" helper that could be placed on the step(s) of the &&-chain that are expected to fail. The helper could set some variable to true if the failure is hit, and false otherwise. It can also convert a failure into a positive result. Then, test_expect_failure could look for that variable's value (after verifying that the &&-chain returns success) to show that all expected failures completed correctly. This could have the side-effect of having a "fixed" test_expect_failure show as a failed test, not a "TODO" message. Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: >> But even with the shortcomings of expect_failure, it still is much >> better than claiming that we expect a bogus outcome. >> >> Improving the shortcomings of expect_failure would be a much better >> use of our time than advocating an abuse of expect_sucess, I would >> think. > > I agree that test_expect_failure has these drawbacks. I've recently > been using _expect_success to document "bad" behavior so we can verify > that behavior changes when that behavior is fixed. But it does have > the drawback of looking like we claim the result is by design. Yeah, I think I saw (and I think I used the same technique myself) people expect a bad output with test_expect_success with an in-code (not in-log) comment that explicitly says "This documents the current behaviour, which is wrong", and that is a very acceptable solution, I would think. > One possible way to correct this is to create a "test_expected_failure" > helper that could be placed on the step(s) of the &&-chain that are > expected to fail. The helper could set some variable to true if the > failure is hit, and false otherwise. It can also convert a failure > into a positive result. Then, test_expect_failure could look for that > variable's value (after verifying that the &&-chain returns success) > to show that all expected failures completed correctly. Yup, I would very much like the direction, and further imagine that the above approach can be extended to ... > This could have the side-effect of having a "fixed" test_expect_failure > show as a failed test, not a "TODO" message. ... avoid such downside. Perhaps call that magic "we know this step fails currently" test_known_breakage and declare that we deprecate the use of test_expect_failure in new tests. Such a test might look like this: test_expect_success 'commit error message should not duplicate' ' test_when_finished "chmod -R u+rwx ." && chmod u-rwx .git/objects/ && orig_head=$(git rev-parse HEAD) && test_must_fail git commit --allow-empty -m "read-only" 2>rawerr && grep "insufficient permission" rawerr >err && test_known_breakage test_line_count = 1 err && new_head=$(git rev-parse HEAD) && test "$orig_head" = "$new_head" ' which may use your trick to turn both failure and success to OK (to let the remainder of the test to continue) but signal the surrounding test_expect_success to say either "TODO know breakage" or "Fixed". Thanks.
On Wed, Oct 13 2021, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>> But even with the shortcomings of expect_failure, it still is much >>> better than claiming that we expect a bogus outcome. >>> >>> Improving the shortcomings of expect_failure would be a much better >>> use of our time than advocating an abuse of expect_sucess, I would >>> think. >> >> I agree that test_expect_failure has these drawbacks. I've recently >> been using _expect_success to document "bad" behavior so we can verify >> that behavior changes when that behavior is fixed. But it does have >> the drawback of looking like we claim the result is by design. > > Yeah, I think I saw (and I think I used the same technique myself) > people expect a bad output with test_expect_success with an in-code > (not in-log) comment that explicitly says "This documents the > current behaviour, which is wrong", and that is a very acceptable > solution, I would think. > >> One possible way to correct this is to create a "test_expected_failure" >> helper that could be placed on the step(s) of the &&-chain that are >> expected to fail. The helper could set some variable to true if the >> failure is hit, and false otherwise. It can also convert a failure >> into a positive result. Then, test_expect_failure could look for that >> variable's value (after verifying that the &&-chain returns success) >> to show that all expected failures completed correctly. > > Yup, I would very much like the direction, and further imagine that > the above approach can be extended to ... > >> This could have the side-effect of having a "fixed" test_expect_failure >> show as a failed test, not a "TODO" message. > > ... avoid such downside. Perhaps call that magic "we know this step > fails currently" test_known_breakage and declare that we deprecate > the use of test_expect_failure in new tests. Such a test might look > like this: > > test_expect_success 'commit error message should not duplicate' ' > test_when_finished "chmod -R u+rwx ." && > chmod u-rwx .git/objects/ && > orig_head=$(git rev-parse HEAD) && > test_must_fail git commit --allow-empty -m "read-only" 2>rawerr && > grep "insufficient permission" rawerr >err && > test_known_breakage test_line_count = 1 err && > new_head=$(git rev-parse HEAD) && > test "$orig_head" = "$new_head" > ' > > which may use your trick to turn both failure and success to OK (to > let the remainder of the test to continue) but signal the > surrounding test_expect_success to say either "TODO know breakage" > or "Fixed". I don't see how it's a downside. Considering the behavior bad now shouldn't entail that we should be fuzzy about testing what exactly *is* happening right now. If one bad but expected state turns into another unexpected bad state the test should fail. In this case this thread spawned off a fix where we print an error twice, instead of once: https://lore.kernel.org/git/cover-v2-0.2-00000000000-20211012T142950Z-avarab@gmail.com/#t That sucks a bit, but printing pages full of such errors in a loop would be way worse, which we'll hide if we insist on not testing the exact emitted output, or on such "test_known_breakage" helpers. It's just a downside because when we fix bugs we'll need to go through the "expected failure" tests and adjust them, but that seems like a feature to me. Now I can submit a patch that fixes a known bug with no test suite changes, and I might not even notice that I fixed one. We may want to have tests for something that really is nondeterministic, e.g. for the code I added in 2d3c02f5db6 (die(): stop hiding errors due to overzealous recursion guard, 2017-06-21). But that'll only be the case for some tiny minority (or none) of the existing callers of "test_expect_failure". 1. https://lore.kernel.org/git/87tuhmk19c.fsf@evledraar.gmail.com/
diff --git a/t/t0001-init.sh b/t/t0001-init.sh index df544bb321f..15724e6a358 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -601,4 +601,13 @@ test_expect_success 'branch -m with the initial branch' ' test again = $(git -C rename-initial symbolic-ref --short HEAD) ' +test_expect_failure 'do stuff' ' + git config alias.fake-SEGV "!f() { echo Fake SEGV; exit 139; }; f" && + git config alias.fake-BUG "!f() { echo Fake BUG; exit 99; }; f" && + + git fake-BUG >expect && + git fake-SEGV >actual && + test_cmp expect actual +' + test_done