Message ID | patch-1.7-4624abc2591-20220318T002951Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | test-lib-functions: a better "test_expect_failure" | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Add an alternative to the "test_expect_failure" function. Like > "test_expect_failure" it will marks a test as "not ok ... TODO" in the "will marks" -> "marks". > TAP output, and thus document that it's a "TODO" test that fails > currently. > > Unlike "test_expect_failure" it asserts that the tested code in > exactly one manner, and not any other. ECANNOTPARSE due to a verb missing. For now I'll assume "It asserts that the tested code fails in exactly one matter" and keep going. > We'll thus stop conflating > e.g. segfaults with other sorts of errors, and generally be able to > tell if our "expected to fail" tests start failing in new and > unexpected ways. The above makes it sound wonderful but I got somewhat confused when I tried to see how well it conveys what it wants to tell by picking an example from this patch. Say, for example: > -test_expect_failure 'git grep .fi a' ' > - git grep .fi a > +test_expect_todo 'git grep .fi a' ' > + test_must_fail git grep .fi a > ' So, we used to say "We eventually, when things are fixed, want 'git grep' to find a string '.fi' in file 'a', but currently this does not work". Now the updated one says "We know 'git grep' fails to find string '.fi' in file 'a'" If it is a trivial single statement test like the above, the distinction does not matter, but if it were a test with multiple steps, the readability would become quite different. It would turn test_expect_failure 'sample test' ' preparation 1 && preparation 2 && git command invocation that we want to succeed ' into test_expect_todo 'sample test' ' preparation 1 && preparation 2 && test_must_fail git command invocation that we want to succeed ' "expect_failure" expects the whole thing to fail, so we will miss if any preparation steps fail, but "expect_todo" is like "expect_success" so it will help us debuging the test by catching a silly mistake in preparation steps. Marking the step that demonstrates the current shortcomings with "MUST FAIL" is a bad taste, but let's pretend that we didn't notice it ;-). Other than that, it looks like an improvement. > * More generally, "test_expect_failure" by design doesn't test what > does, but just asserts that the test fails in some way. Exactly. It matters in complex tests that !(A&B&C) is different from (A&B&!C), the latter of which is what we want to do with tests that document what currently does not work. > - test_expect_failure [<prereq>] <message> <script> > > This is NOT the opposite of test_expect_success, but is used > - to mark a test that demonstrates a known breakage. Unlike > + to mark a test that demonstrates a known breakage whose exact failure > + behavior isn't predictable. > + > + If the exact breakage is known the "test_expect_todo" function > + should be used instead. Usethis function if it's hard to pin down > + the exact nature of the failure. Unlike "Usethis" -> "Use this" > -test_expect_failure 'clone --local detects misnamed objects' ' > - test_must_fail git clone --local misnamed misnamed-checkout > +test_expect_todo 'clone --local detects misnamed objects' ' > + git clone --local misnamed misnamed-checkout > ' Just like too many negatives confuse readers, I have to say this is quite confusing. Do we or do we not want "git clone" invocation to succeed? > +test_expect_failure () { > + _test_expect_todo test_expect_failure "$@" > +} > + > +test_expect_todo () { > + _test_expect_todo test_expect_todo "$@" > +} It is a bit surprising that _test_expect_todo does not share more of its implementation with test_expect_success, but let's pretend we didn't see it. Looks like a good first step. I'd stop reading the series for now at this one.
Junio C Hamano <gitster@pobox.com> writes: > Marking the step that demonstrates the current shortcomings with > "MUST FAIL" is a bad taste, but let's pretend that we didn't notice > it ;-). Other than that, it looks like an improvement. > ... >> +test_expect_failure () { >> + _test_expect_todo test_expect_failure "$@" >> +} >> + >> +test_expect_todo () { >> + _test_expect_todo test_expect_todo "$@" >> +} > > It is a bit surprising that _test_expect_todo does not share more of > its implementation with test_expect_success, but let's pretend we > didn't see it. With a bit more tweak, I think this can be made palatable: * introduce something that is *NOT* test_must_fail but behaves like so, but with a bit of magic (see below). * do not introduce test_expect_todo, but use an improved version of test_expect_success. Let's illustrate what I mean by starting from an example that we want to have _after_ fixing an known breakage. Let's say we expect that our test preparation succeeds, 'git foo' fails when given a bad option, 'git foo' runs successfully, and the command is expected to emit certain output. We may assert the ideal future world like so: test_expect_success 'make sure foo works the way we want' ' preparatory step && test_must_fail git foo --bad-option >error && grep "expected error message" error && ! grep "unwanted error message" error && git foo >output && grep expected output && ! grep unwanted output ' Let's also imagine that right now, option parsing in "git foo", works but the main execution of the command does not work. With test_expect_todo, you have to write something like this to document the current breakage: test_expect_todo 'document breakage' ' preparatory step && test_must_fail git foo --bad-option >error && grep "expected error message" error && ! grep "unwanted error message" error && test_must_fail git foo >output && ! grep expected output && grep unwanted output ' You can see that this makes one thing unclear. Among the two test_must_fail and two !, which one(s) document the breakage? In other words, which one of these four negations do we wish to lift eventually? The answer is the latter two (we said that handling of "--bad-option" is already working), but it is not obvious in the above test_expect_todo test sequence. I'd suggest we allow our test to be written this way: test_expect_success 'make sure foo works the way we want' ' preparatory step && test_must_fail git foo --bad-option >error && grep "expected error message" error && ! grep "unwanted error message" error && test_ki git foo >output && test_ki grep expected output && test_ki ! grep unwanted output ' and teach test_expect_success that an invocation of test_ki ("known issue"---a better name that is NOT test_must_fail very much welcome) means we hope this test someday passes without test_ki but not today, i.e. what your test_expect_todo means, and we unfortunately have to expect that the lines annotated with test_ki would "fail". To readers, test_ki should appear as if an annotation to a single command that highlights what we want to eventually be able to fix, and when the issue around the line marked as test_ki is fixed, we can signal the fact by removing it from the beginning of these lines (that is why the last one is "test_ki !" and not "! test_ki"). To the test framework and the shell that is running the test, test_ki would be almost identical to test_must_fail, i.e. runs the rest of the command, catch segv and report, but otherwise the failure from that "rest of the command" execution is turned into "success" that lets &&-chain to continue. In addition, it tells the test_expect_success running it that a success of the test piece should be shown as TODO (expected failure). Hmm?
On Fri, Mar 18 2022, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Marking the step that demonstrates the current shortcomings with >> "MUST FAIL" is a bad taste, but let's pretend that we didn't notice >> it ;-). Other than that, it looks like an improvement. >> ... >>> +test_expect_failure () { >>> + _test_expect_todo test_expect_failure "$@" >>> +} >>> + >>> +test_expect_todo () { >>> + _test_expect_todo test_expect_todo "$@" >>> +} >> >> It is a bit surprising that _test_expect_todo does not share more of >> its implementation with test_expect_success, but let's pretend we >> didn't see it. > > With a bit more tweak, I think this can be made palatable: > > * introduce something that is *NOT* test_must_fail but behaves like > so, but with a bit of magic (see below). > > * do not introduce test_expect_todo, but use an improved version of > test_expect_success. > > Let's illustrate what I mean by starting from an example that we > want to have _after_ fixing an known breakage. Let's say we expect > that our test preparation succeeds, 'git foo' fails when given a bad > option, 'git foo' runs successfully, and the command is expected to > emit certain output. We may assert the ideal future world like so: > > test_expect_success 'make sure foo works the way we want' ' > preparatory step && > test_must_fail git foo --bad-option >error && > grep "expected error message" error && > ! grep "unwanted error message" error && > git foo >output && > grep expected output && > ! grep unwanted output > ' > > Let's also imagine that right now, option parsing in "git foo", > works but the main execution of the command does not work. > > With test_expect_todo, you have to write something like this > to document the current breakage: > > test_expect_todo 'document breakage' ' > preparatory step && > test_must_fail git foo --bad-option >error && > grep "expected error message" error && > ! grep "unwanted error message" error && > test_must_fail git foo >output && > ! grep expected output && > grep unwanted output > ' > > You can see that this makes one thing unclear. > > Among the two test_must_fail and two !, which one(s) document the > breakage? In other words, which one of these four negations do we > wish to lift eventually? The answer is the latter two (we said that > handling of "--bad-option" is already working), but it is not obvious > in the above test_expect_todo test sequence. > > I'd suggest we allow our test to be written this way: > > test_expect_success 'make sure foo works the way we want' ' > preparatory step && > test_must_fail git foo --bad-option >error && > grep "expected error message" error && > ! grep "unwanted error message" error && > test_ki git foo >output && > test_ki grep expected output && > test_ki ! grep unwanted output > ' > > and teach test_expect_success that an invocation of test_ki ("known > issue"---a better name that is NOT test_must_fail very much welcome) > means we hope this test someday passes without test_ki but not > today, i.e. what your test_expect_todo means, and we unfortunately > have to expect that the lines annotated with test_ki would "fail". > > To readers, test_ki should appear as if an annotation to a single > command that highlights what we want to eventually be able to fix, > and when the issue around the line marked as test_ki is fixed, we > can signal the fact by removing it from the beginning of these lines > (that is why the last one is "test_ki !" and not "! test_ki"). > > To the test framework and the shell that is running the test, > test_ki would be almost identical to test_must_fail, i.e. runs the > rest of the command, catch segv and report, but otherwise the > failure from that "rest of the command" execution is turned into > "success" that lets &&-chain to continue. In addition, it tells > the test_expect_success running it that a success of the test piece > should be shown as TODO (expected failure). > > Hmm? Have you had the time to look past patch 1/7 of this series? 2/7 introduces a "test_todo" helper, the "test_expect_todo" is just the basic top-level primitive. I don't think we can categorically replace all of the "test_expect_failure" cases, because in some of those it's too much of a hassle to assert the exact current behavior, or we don't really care. But I think for most cases instead of a: test_ki ! grep unwanted output It makes more sense to do (as that helper does): test_todo grep --want yay --expect unwanted -- output Which is quite a handful, which is why the series goes on to e.g. introduce a todo_test_cmp, used e.g. like this (and we can easily add more wrappers for common cases): cat >want <<-EOF && $(git rev-parse HEAD) EOF cat >expect <<-\EOF && error: can't rev-parse stuff EOF test_might_fail git some-cmd HEAD >actual && todo_test_cmp want expect actual I.e. if you just want to throw your hands in the air and say "I don't care how this fails and just emit a 'not ok .. TODO' line" we already have test_expect_failure for that use-case. I think in most other cases documenting that something is broken or should behave differently shouldn't be synonymous with not caring *how* that unwanted behavior works right now. Understanding of your past commentary on this topic is that you strongly objected to not having the test suite output reflect that a given test was "not ok ... TODO" in some way. I.e. even though I think "test_expect_success" has the approximate *semantics* we want we shouldn't use those. But I think the combination of "test_expect_todo" and the "test_todo" primitive should satisfy that, and will give us accurate assertions about what we're actually doing now.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> emit certain output. We may assert the ideal future world like so: >> >> test_expect_success 'make sure foo works the way we want' ' >> preparatory step && >> test_must_fail git foo --bad-option >error && >> grep "expected error message" error && >> ! grep "unwanted error message" error && >> git foo >output && >> grep expected output && >> ! grep unwanted output >> ' >> >> Let's also imagine that right now, option parsing in "git foo", >> works but the main execution of the command does not work. >> >> With test_expect_todo, you have to write something like this >> to document the current breakage: >> >> test_expect_todo 'document breakage' ' >> preparatory step && >> test_must_fail git foo --bad-option >error && >> grep "expected error message" error && >> ! grep "unwanted error message" error && >> test_must_fail git foo >output && >> ! grep expected output && >> grep unwanted output >> ' >> >> You can see that this makes one thing unclear. >> >> Among the two test_must_fail and two !, which one(s) document the >> breakage? In other words, which one of these four negations do we >> wish to lift eventually? The answer is the latter two (we said that >> handling of "--bad-option" is already working), but it is not obvious >> in the above test_expect_todo test sequence. >> >> I'd suggest we allow our test to be written this way: >> >> test_expect_success 'make sure foo works the way we want' ' >> preparatory step && >> test_must_fail git foo --bad-option >error && >> grep "expected error message" error && >> ! grep "unwanted error message" error && >> test_ki git foo >output && >> test_ki grep expected output && >> test_ki ! grep unwanted output >> ' >> >> and teach test_expect_success that an invocation of test_ki ("known >> issue"---a better name that is NOT test_must_fail very much welcome) >> means we hope this test someday passes without test_ki but not >> today, i.e. what your test_expect_todo means, and we unfortunately >> have to expect that the lines annotated with test_ki would "fail". > Have you had the time to look past patch 1/7 of this series? 2/7 > introduces a "test_todo" helper, the "test_expect_todo" is just the > basic top-level primitive. No, and I do not have to. I care about the most basic form first, and if you cannot get it right, it is not interesting. You can consider the test_ki above as the primitive form of your "test_todo" that says "I want the command to give true, but I know it currently gives false". And quite honestly, I am not interested in _how_ it currently happens to break. We may want the command being tested to eventually count three commits, but due to a bug, it may only count one. You may say "test_todo count --want 3 --expect 1 blah", but the "expect" part is much less interesting than the fact that the command being tested on _that_ line (not the whole sequence run with test_expect_failure) is clearly documented to want 3 but currently is broken, and it can be clearly distinguished from the normal test_must_fail or ! that documents that we do want a failure out of the command being tested there. So with or without the "higher level" wrappers, how else, other than the way I showed in the message you are responding to as a rewrite of the example to use test_expect_todo, that uses two test_must_fail and two ! and makes which ones are expected failure and which ones are documentation of the current breakage, do you propose to write the equivalent? It may be that your test_todo may be a different way to spell the test_ki marker I showed above, and if that is the case it is perfectly fine, but I want it to be THE primitive, not test_must_fail or !, to mark a single command in the test that currently does not work as expected. > I don't think we can categorically replace all of the > "test_expect_failure" cases, because in some of those it's too much of a > hassle to assert the exact current behavior, or we don't really care. > > But I think for most cases instead of a: > > test_ki ! grep unwanted output > > It makes more sense to do (as that helper does): > > test_todo grep --want yay --expect unwanted -- output My take is the complete opposite. We can and should start small, and how exactly the current implementation happens to be broken does not matter most of the time.
On Sat, Mar 19 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>> emit certain output. We may assert the ideal future world like so: >>> >>> test_expect_success 'make sure foo works the way we want' ' >>> preparatory step && >>> test_must_fail git foo --bad-option >error && >>> grep "expected error message" error && >>> ! grep "unwanted error message" error && >>> git foo >output && >>> grep expected output && >>> ! grep unwanted output >>> ' >>> >>> Let's also imagine that right now, option parsing in "git foo", >>> works but the main execution of the command does not work. >>> >>> With test_expect_todo, you have to write something like this >>> to document the current breakage: >>> >>> test_expect_todo 'document breakage' ' >>> preparatory step && >>> test_must_fail git foo --bad-option >error && >>> grep "expected error message" error && >>> ! grep "unwanted error message" error && >>> test_must_fail git foo >output && >>> ! grep expected output && >>> grep unwanted output >>> ' >>> >>> You can see that this makes one thing unclear. >>> >>> Among the two test_must_fail and two !, which one(s) document the >>> breakage? In other words, which one of these four negations do we >>> wish to lift eventually? The answer is the latter two (we said that >>> handling of "--bad-option" is already working), but it is not obvious >>> in the above test_expect_todo test sequence. >>> >>> I'd suggest we allow our test to be written this way: >>> >>> test_expect_success 'make sure foo works the way we want' ' >>> preparatory step && >>> test_must_fail git foo --bad-option >error && >>> grep "expected error message" error && >>> ! grep "unwanted error message" error && >>> test_ki git foo >output && >>> test_ki grep expected output && >>> test_ki ! grep unwanted output >>> ' >>> >>> and teach test_expect_success that an invocation of test_ki ("known >>> issue"---a better name that is NOT test_must_fail very much welcome) >>> means we hope this test someday passes without test_ki but not >>> today, i.e. what your test_expect_todo means, and we unfortunately >>> have to expect that the lines annotated with test_ki would "fail". > >> Have you had the time to look past patch 1/7 of this series? 2/7 >> introduces a "test_todo" helper, the "test_expect_todo" is just the >> basic top-level primitive. > > No, and I do not have to. I care about the most basic form first, > and if you cannot get it right, it is not interesting. You can > consider the test_ki above as the primitive form of your "test_todo" > that says "I want the command to give true, but I know it currently > gives false". Sure, and I do have that implemented. If you're just asking that my "test_todo" or another helper do that by default, then that's easy. I.e. I've got that, but not as one short "test_*" verb. > And quite honestly, I am not interested in _how_ it currently > happens to break. We may want the command being tested to > eventually count three commits, but due to a bug, it may only count > one. You may say "test_todo count --want 3 --expect 1 blah", but > the "expect" part is much less interesting than the fact that the > command being tested on _that_ line (not the whole sequence run with > test_expect_failure) is clearly documented to want 3 but currently > is broken, and it can be clearly distinguished from the normal > test_must_fail or ! that documents that we do want a failure out of > the command being tested there. Yes, if you don't want to test the exact behavior you have/want that's also easy. > So with or without the "higher level" wrappers, how else, other than > the way I showed in the message you are responding to as a rewrite > of the example to use test_expect_todo, that uses two test_must_fail > and two ! and makes which ones are expected failure and which ones > are documentation of the current breakage, do you propose to write > the equivalent? It may be that your test_todo may be a different > way to spell the test_ki marker I showed above, and if that is the > case it is perfectly fine, but I want it to be THE primitive, not > test_must_fail or !, to mark a single command in the test that > currently does not work as expected. Sure, yes it's basically a different way to spell the same thing.... >> I don't think we can categorically replace all of the >> "test_expect_failure" cases, because in some of those it's too much of a >> hassle to assert the exact current behavior, or we don't really care. >> >> But I think for most cases instead of a: >> >> test_ki ! grep unwanted output >> >> It makes more sense to do (as that helper does): >> >> test_todo grep --want yay --expect unwanted -- output > > My take is the complete opposite. We can and should start small, > and how exactly the current implementation happens to be broken does > not matter most of the time. Well, the tip of this series leaves ~20 uses of test_expect_todo v.s. a remaining ~100 uses of test_expect_failure, so it is a small start. I converted those things I thought made the most sense. But I do think you want to test at least a fuzzy "how exactly" most of the time. The reason I worked on this was because while authoring the series you merged in ea05fd5fbf7 (Merge branch 'ab/keep-git-exit-codes-in-tests', 2022-03-16) I found that we had various test_expect_failure that failed in ways very different than what we'd expect. Or, saying that something exits non-zero and we'd like to fix it isn't the same as saying that we'd like to e.g. exclude it from SANITIZE=leak or SANITIZE=address testing. I.e. it still shouldn't leak, double-free() or run into a BUG(), and if it does we'd like to know most of the time. I think the only sensible thing to do to fix that is to have the semantics of test_expect_todo, within that you can always decide to ignore individual exit codes, but you can't really do it the other way around (which is what test_expect_failure does).
Hi Ævar On 19/03/2022 11:11, Ævar Arnfjörð Bjarmason wrote: > > On Sat, Mar 19 2022, Junio C Hamano wrote: >[...] >>> I don't think we can categorically replace all of the >>> "test_expect_failure" cases, because in some of those it's too much of a >>> hassle to assert the exact current behavior, or we don't really care. >>> >>> But I think for most cases instead of a: >>> >>> test_ki ! grep unwanted output >>> >>> It makes more sense to do (as that helper does): >>> >>> test_todo grep --want yay --expect unwanted -- output >> >> My take is the complete opposite. We can and should start small, >> and how exactly the current implementation happens to be broken does >> not matter most of the time. > > Well, the tip of this series leaves ~20 uses of test_expect_todo v.s. a > remaining ~100 uses of test_expect_failure, so it is a small start. I > converted those things I thought made the most sense. > > But I do think you want to test at least a fuzzy "how exactly" most of > the time. The reason I worked on this was because while authoring the > series you merged in ea05fd5fbf7 (Merge branch > 'ab/keep-git-exit-codes-in-tests', 2022-03-16) I found that we had > various test_expect_failure that failed in ways very different than what > we'd expect. > > Or, saying that something exits non-zero and we'd like to fix it isn't > the same as saying that we'd like to e.g. exclude it from SANITIZE=leak > or SANITIZE=address testing. I.e. it still shouldn't leak, double-free() > or run into a BUG(), and if it does we'd like to know most of the time. I think having test_todo based on test_must_fail as Junio suggested avoids this as it means the test will still fail on SIGSEV or SIGABRT (if we don't already do so we can make LSAN exit with the same code as vargrind which test_must_fail checks for). I had a look at some of the conversions with your test_todo --want/--expect/--reset and found the result really hard to follow. Junio's suggestions chimed with some things I've been thinking about so I had a go at implementing it and doing some sample conversions (see below). Marking the individual commands that should fail is a big step forward and the failing commands are checked to make sure they don't segfault etc. Best Wishes Phillip ---- >8 ----- t/t0000-basic.sh | 9 +++++-- t/t3401-rebase-and-am-rename.sh | 12 +++++----- t/t3424-rebase-empty.sh | 6 ++--- t/t4014-format-patch.sh | 20 ++++++++-------- t/test-lib-functions.sh | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------- 5 files changed, 116 insertions(+), 61 deletions(-) diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index b007f0efef..53217d4cd5 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -103,16 +103,21 @@ test_expect_success 'subtest: 2/3 tests passing' ' test_expect_success 'subtest: a failing TODO test' ' write_and_run_sub_test_lib_test failing-todo <<-\EOF && + test_false () { + false + } test_expect_success "passing test" "true" test_expect_failure "pretend we have a known breakage" "false" + test_expect_success "known todo" "test_todo test_false" test_done EOF check_sub_test_lib_test failing-todo <<-\EOF > ok 1 - passing test > not ok 2 - pretend we have a known breakage # TODO known breakage - > # still have 1 known breakage(s) + > not ok 3 - known todo # TODO known breakage + > # still have 2 known breakage(s) > # passed all remaining 1 test(s) - > 1..2 + > 1..3 EOF ' diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh index f18bae9450..cc5da9f5af 100755 --- a/t/t3401-rebase-and-am-rename.sh +++ b/t/t3401-rebase-and-am-rename.sh @@ -52,7 +52,7 @@ test_expect_success 'rebase --interactive: directory rename detected' ' ) ' -test_expect_failure 'rebase --apply: directory rename detected' ' +test_expect_success 'rebase --apply: directory rename detected' ' ( cd dir-rename && @@ -63,8 +63,8 @@ test_expect_failure 'rebase --apply: directory rename detected' ' git ls-files -s >out && test_line_count = 5 out && - test_path_is_file y/d && - test_path_is_missing x/d + test_todo test_path_is_file y/d && + test_todo test_path_is_missing x/d ) ' @@ -84,7 +84,7 @@ test_expect_success 'rebase --merge: directory rename detected' ' ) ' -test_expect_failure 'am: directory rename detected' ' +test_expect_success 'am: directory rename detected' ' ( cd dir-rename && @@ -97,8 +97,8 @@ test_expect_failure 'am: directory rename detected' ' git ls-files -s >out && test_line_count = 5 out && - test_path_is_file y/d && - test_path_is_missing x/d + test_todo test_path_is_file y/d && + test_todo test_path_is_missing x/d ) ' diff --git a/t/t3424-rebase-empty.sh b/t/t3424-rebase-empty.sh index 5e1045a0af..b7cae26075 100755 --- a/t/t3424-rebase-empty.sh +++ b/t/t3424-rebase-empty.sh @@ -34,15 +34,15 @@ test_expect_success 'setup test repository' ' git commit -m "Five letters ought to be enough for anybody" ' -test_expect_failure 'rebase (apply-backend)' ' - test_when_finished "git rebase --abort" && +test_expect_success 'rebase (apply-backend)' ' + test_when_finished "test_might_fail git rebase --abort" && git checkout -B testing localmods && # rebase (--apply) should not drop commits that start empty git rebase --apply upstream && test_write_lines D C B A >expect && git log --format=%s >actual && - test_cmp expect actual + test_todo test_cmp expect actual ' test_expect_success 'rebase --merge --empty=drop' ' diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 7dc5a5c736..bb0fcef40e 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -165,12 +165,12 @@ test_expect_success 'additional command line cc (ascii)' ' grep "^ *S E Cipient <scipient@example.com>\$" hdrs5 ' -test_expect_failure 'additional command line cc (rfc822)' ' +test_expect_success 'additional command line cc (rfc822)' ' git config --replace-all format.headers "Cc: R E Cipient <rcipient@example.com>" && git format-patch --cc="S. E. Cipient <scipient@example.com>" --stdout main..side >patch5 && sed -e "/^\$/q" patch5 >hdrs5 && grep "^Cc: R E Cipient <rcipient@example.com>,\$" hdrs5 && - grep "^ *\"S. E. Cipient\" <scipient@example.com>\$" hdrs5 + test_todo grep "^ *\"S. E. Cipient\" <scipient@example.com>\$" hdrs5 ' test_expect_success 'command line headers' ' @@ -195,16 +195,16 @@ test_expect_success 'command line To: header (ascii)' ' grep "^To: R E Cipient <rcipient@example.com>\$" hdrs8 ' -test_expect_failure 'command line To: header (rfc822)' ' +test_expect_success 'command line To: header (rfc822)' ' git format-patch --to="R. E. Cipient <rcipient@example.com>" --stdout main..side >patch8 && sed -e "/^\$/q" patch8 >hdrs8 && - grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs8 + test_todo grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs8 ' -test_expect_failure 'command line To: header (rfc2047)' ' +test_expect_success 'command line To: header (rfc2047)' ' git format-patch --to="R Ä Cipient <rcipient@example.com>" --stdout main..side >patch8 && sed -e "/^\$/q" patch8 >hdrs8 && - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs8 + test_todo grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs8 ' test_expect_success 'configuration To: header (ascii)' ' @@ -214,18 +214,18 @@ test_expect_success 'configuration To: header (ascii)' ' grep "^To: R E Cipient <rcipient@example.com>\$" hdrs9 ' -test_expect_failure 'configuration To: header (rfc822)' ' +test_expect_success 'configuration To: header (rfc822)' ' git config format.to "R. E. Cipient <rcipient@example.com>" && git format-patch --stdout main..side >patch9 && sed -e "/^\$/q" patch9 >hdrs9 && - grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs9 + test_todo grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs9 ' -test_expect_failure 'configuration To: header (rfc2047)' ' +test_expect_success 'configuration To: header (rfc2047)' ' git config format.to "R Ä Cipient <rcipient@example.com>" && git format-patch --stdout main..side >patch9 && sed -e "/^\$/q" patch9 >hdrs9 && - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs9 + test_todo grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs9 ' # check_patch <patch>: Verify that <patch> looks like a half-sane diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 0f439c99d6..deb74a22f3 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -739,6 +739,7 @@ test_expect_failure () { } test_expect_success () { + test_todo_= test_start_ test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= test "$#" = 2 || @@ -750,7 +751,12 @@ test_expect_success () { say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $2" if test_run_ "$2" then - test_ok_ "$1" + if test -n "$test_todo_" + then + test_known_broken_failure_ "$1" + else + test_ok_ "$1" + fi else test_failure_ "$@" fi @@ -1011,8 +1017,20 @@ list_contains () { # Returns success if the arguments indicate that a command should be # accepted by test_must_fail(). If the command is run with env, the env # and its corresponding variable settings will be stripped before we +# we test the command being run. +# +# For test_todo we allow a wider range of commands to and if the +# command is run with verbose then verbose will be stripped before we # test the command being run. + test_must_fail_acceptable () { + local name + name="$1" + shift + if test "$name" = "todo" && test "$1" = "verbose" + then + shift + fi if test "$1" = "env" then shift @@ -1033,44 +1051,22 @@ test_must_fail_acceptable () { git|__git*|test-tool|test_terminal) return 0 ;; + grep|test|test_*) + if test "$name" = "todo" + then + return 0 + fi + return 1 + ;; *) return 1 ;; esac } -# This is not among top-level (test_expect_success | test_expect_failure) -# but is a prefix that can be used in the test script, like: -# -# test_expect_success 'complain and die' ' -# do something && -# do something else && -# test_must_fail git checkout ../outerspace -# ' -# -# Writing this as "! git checkout ../outerspace" is wrong, because -# the failure could be due to a segv. We want a controlled failure. -# -# Accepts the following options: -# -# ok=<signal-name>[,<...>]: -# Don't treat an exit caused by the given signal as error. -# Multiple signals can be specified as a comma separated list. -# Currently recognized signal names are: sigpipe, success. -# (Don't use 'success', use 'test_might_fail' instead.) -# -# Do not use this to run anything but "git" and other specific testable -# commands (see test_must_fail_acceptable()). We are not in the -# business of vetting system supplied commands -- in other words, this -# is wrong: -# -# test_must_fail grep pattern output -# -# Instead use '!': -# -# ! grep pattern output - -test_must_fail () { +test_must_fail_helper () { + test_must_fail_name_="$1" + shift case "$1" in ok=*) _test_ok=${1#ok=} @@ -1080,36 +1076,90 @@ test_must_fail () { _test_ok= ;; esac - if ! test_must_fail_acceptable "$@" + if ! test_must_fail_acceptable $test_must_fail_name_ "$@" then - echo >&7 "test_must_fail: only 'git' is allowed: $*" + echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*" return 1 fi "$@" 2>&7 exit_code=$? if test $exit_code -eq 0 && ! list_contains "$_test_ok" success then - echo >&4 "test_must_fail: command succeeded: $*" + echo >&4 "test_$test_must_fail_name_: command succeeded: $*" return 1 elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe then return 0 elif test $exit_code -gt 129 && test $exit_code -le 192 then - echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*" + echo >&4 "test_$test_must_fail_name_: died by signal $(($exit_code - 128)): $*" return 1 elif test $exit_code -eq 127 then - echo >&4 "test_must_fail: command not found: $*" + echo >&4 "test_$test_must_fail_name_: command not found: $*" return 1 elif test $exit_code -eq 126 then - echo >&4 "test_must_fail: valgrind error: $*" + echo >&4 "test_$test_must_fail_name_: valgrind error: $*" return 1 fi return 0 } 7>&2 2>&4 +# This is used to mark commands that should succeed but do not due to +# a known issue. Marking the individual commands that fail rather than +# using test_expect_failure allows us to detect any unexpected +# failures. As with test_must_fail if the command is killed by a +# signal the test will fail. If the command unexpectedly succeeds then +# the test fails. For example: +# +# test_expect_success 'test a known failure' ' +# git foo 2>err && +# test_todo test_must_be_empty err +# ' +# +# This test will fail if "git foo" fails or err is unexpectedly empty + +test_todo () { + test_todo_=todo + test_must_fail_helper todo "$@" 2>&7 +} 7>&2 2>&4 + +# This is not among top-level (test_expect_success | test_expect_failure) +# but is a prefix that can be used in the test script, like: +# +# test_expect_success 'complain and die' ' +# do something && +# do something else && +# test_must_fail git checkout ../outerspace +# ' +# +# Writing this as "! git checkout ../outerspace" is wrong, because +# the failure could be due to a segv. We want a controlled failure. +# +# Accepts the following options: +# +# ok=<signal-name>[,<...>]: +# Don't treat an exit caused by the given signal as error. +# Multiple signals can be specified as a comma separated list. +# Currently recognized signal names are: sigpipe, success. +# (Don't use 'success', use 'test_might_fail' instead.) +# +# Do not use this to run anything but "git" and other specific testable +# commands (see test_must_fail_acceptable()). We are not in the +# business of vetting system supplied commands -- in other words, this +# is wrong: +# +# test_must_fail grep pattern output +# +# Instead use '!': +# +# ! grep pattern output + +test_must_fail () { + test_must_fail_helper must_fail "$@" 2>&7 +} 7>&2 2>&4 + # Similar to test_must_fail, but tolerates success, too. This is # meant to be used in contexts like: # @@ -1124,7 +1174,7 @@ test_must_fail () { # Accepts the same options as test_must_fail. test_might_fail () { - test_must_fail ok=success "$@" 2>&7 + test_must_fail_helper might_fail ok=success "$@" 2>&7 } 7>&2 2>&4 # Similar to test_must_fail and test_might_fail, but check that a
Phillip Wood <phillip.wood123@gmail.com> writes: > .... I had a look > at some of the conversions with your test_todo --want/--expect/--reset > and found the result really hard to follow. Junio's suggestions chimed > with some things I've been thinking about so I had a go at > implementing it and doing some sample conversions (see below). Marking > the individual commands that should fail is a big step forward and the > failing commands are checked to make sure they don't segfault etc. ;-) Another small detail in my suggestion that will make a huge difference in the end is not to introduce test_expect_todo as a separate top-level construct, and instead teach test_expect_success to pay attention to the use of test_todo "unfortunately this does not work yet" mark in it. It allows us to use test_todo in a shared test helper function and call them in test_expect_success, and when the step the test helper has trouble with gets fixed, the "unmark" step will be an isolated change. Your sample change seems to already have it, which is good. > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh > index 0f439c99d6..deb74a22f3 100644 > --- a/t/test-lib-functions.sh > +++ b/t/test-lib-functions.sh > @@ -739,6 +739,7 @@ test_expect_failure () { > } > test_expect_success () { > + test_todo_= > test_start_ > test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= > test "$#" = 2 || > @@ -750,7 +751,12 @@ test_expect_success () { > say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $2" > if test_run_ "$2" > then > - test_ok_ "$1" > + if test -n "$test_todo_" > + then > + test_known_broken_failure_ "$1" > + else > + test_ok_ "$1" > + fi > else > test_failure_ "$@" > fi
On 3/20/22 2:07 PM, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> .... I had a look >> at some of the conversions with your test_todo --want/--expect/--reset >> and found the result really hard to follow. Junio's suggestions chimed >> with some things I've been thinking about so I had a go at >> implementing it and doing some sample conversions (see below). Marking >> the individual commands that should fail is a big step forward and the >> failing commands are checked to make sure they don't segfault etc. > > ;-) > > Another small detail in my suggestion that will make a huge > difference in the end is not to introduce test_expect_todo as a > separate top-level construct, and instead teach test_expect_success > to pay attention to the use of test_todo "unfortunately this does > not work yet" mark in it. It allows us to use test_todo in a shared > test helper function and call them in test_expect_success, and when > the step the test helper has trouble with gets fixed, the "unmark" > step will be an isolated change. > > Your sample change seems to already have it, which is good. After reading this thread, I agree with many of the ideas that were generated in response to this topic. The thing I'm hoping to see from a final version is that a top-level helper like test_expect_todo will expect at least one test_todo helper to be executed inside of the test (perhaps communicated by setting a special GIT_TEST_* environment variable?) and if any of the test_todo lines change from fail to pass, then that is communicated as a _failure_ from CI's perspective. Let us discover if we have accidentally "fixed" any of these test cases and update the tests accordingly. I can predict writing a test case with multiple test_todo lines that need to be updated to drop the test_todo helpers one-by-one as a change is being introduced. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > The thing I'm hoping to see from a final version is that a top-level > helper like test_expect_todo will expect at least one test_todo > helper to be executed inside of the test (perhaps communicated by > setting a special GIT_TEST_* environment variable?) and if any of I was hoping that we can do without test_expect_todo. test_expect_success can turn itself into test_expect_todo when it sees test_todo is invoked even once in it. And Phillip's outline actually implements the idea, if I am not mistaken. > the test_todo lines change from fail to pass, then that is > communicated as a _failure_ from CI's perspective. Let us discover > if we have accidentally "fixed" any of these test cases and update > the tests accordingly. In other words, we do not want to lose the "TODO fixed" we have been getting out of test_expect_failure, which I agree with. I am not sure if Phillip's outline had that feature. > I can predict writing a test case with multiple test_todo lines > that need to be updated to drop the test_todo helpers one-by-one > as a change is being introduced. Yes.
On 23/03/2022 22:13, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> The thing I'm hoping to see from a final version is that a top-level >> helper like test_expect_todo will expect at least one test_todo >> helper to be executed inside of the test (perhaps communicated by >> setting a special GIT_TEST_* environment variable?) and if any of > > I was hoping that we can do without test_expect_todo. > test_expect_success can turn itself into test_expect_todo when it > sees test_todo is invoked even once in it. And Phillip's outline > actually implements the idea, if I am not mistaken. You are correct, there is no test_expect_todo in the outline I posted, test_todo lives within test_expect_success and works like test_must_fail. >> the test_todo lines change from fail to pass, then that is >> communicated as a _failure_ from CI's perspective. Let us discover >> if we have accidentally "fixed" any of these test cases and update >> the tests accordingly. > > In other words, we do not want to lose the "TODO fixed" we have been > getting out of test_expect_failure, which I agree with. I read Stolee's comments the other way, that we want the test harness to see the test fail rather passing as it does with the "TODO fixed" feature. In one of my early contributions I inadvertently fixed a submodule test and did not realize until someone pointed it out because the CI passes rather than fails when a test_expect_failure is fixed. > I am not > sure if Phillip's outline had that feature. In my outline the test fails if any command marked by test_todo is successful and test_todo prints a messaging saying the command was unexpectedly successful. Implementing the "TODO fixed" feature gets tricky when there is more than one test_todo within a single test_expect_success - what if one is test_todo command is successful and the others fail? Best Wishes Phillip >> I can predict writing a test case with multiple test_todo lines >> that need to be updated to drop the test_todo helpers one-by-one >> as a change is being introduced. > > Yes.
diff --git a/t/README b/t/README index f48e0542cdc..e0aa8ebb0eb 100644 --- a/t/README +++ b/t/README @@ -805,10 +805,28 @@ see test-lib-functions.sh for the full list and their options. test_expect_success PERL,PYTHON 'yo dawg' \ ' test $(perl -E 'print eval "1 +" . qx[python -c "print 2"]') == "4" ' + - test_expect_todo [<prereq>] <message> <script> + + A "test_expect_success" which on "success" will mark the test as a + TODO test, and on failure will emit a real failure. + + This should be used to mark a test whose exact bad behavior is + known, but whose outcome isn't preferred, to distinguish it from + those tests that use "test_expect_success" to indicate known and + preferred behavior. + + Like test_expect_success this function can optionally use a three + argument invocation with a prerequisite as the first argument. + - test_expect_failure [<prereq>] <message> <script> This is NOT the opposite of test_expect_success, but is used - to mark a test that demonstrates a known breakage. Unlike + to mark a test that demonstrates a known breakage whose exact failure + behavior isn't predictable. + + If the exact breakage is known the "test_expect_todo" function + should be used instead. Usethis function if it's hard to pin down + the exact nature of the failure. Unlike the usual test_expect_success tests, which say "ok" on success and "FAIL" on failure, this will say "FIXED" on success and "still broken" on failure. Failures from these diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index b007f0efef2..e46638f1f7b 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -176,6 +176,8 @@ test_expect_success 'subtest: mixed results: a mixture of all possible results' test_expect_failure "pretend we have a known breakage" "false" test_expect_failure "pretend we have a known breakage" "false" test_expect_failure "pretend we have fixed a known breakage" "true" + test_expect_todo "pretend we have a known TODO" "true" + test_expect_todo "pretend we have a bad TODO" "false" test_done EOF check_sub_test_lib_test mixed-results2 <<-\EOF @@ -192,10 +194,13 @@ test_expect_success 'subtest: mixed results: a mixture of all possible results' > not ok 8 - pretend we have a known breakage # TODO known breakage > not ok 9 - pretend we have a known breakage # TODO known breakage > ok 10 - pretend we have fixed a known breakage # TODO known breakage vanished + > not ok 11 - pretend we have a known TODO # TODO known breakage + > not ok 12 - pretend we have a bad TODO (broken '\''test_expect_todo'\''!) + > # false > # 1 known breakage(s) vanished; please update test(s) - > # still have 2 known breakage(s) - > # failed 3 among remaining 7 test(s) - > 1..10 + > # still have 3 known breakage(s) + > # failed 4 among remaining 8 test(s) + > 1..12 EOF ' diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh index bc89371f534..b7023f5644c 100755 --- a/t/t1060-object-corruption.sh +++ b/t/t1060-object-corruption.sh @@ -112,8 +112,8 @@ test_expect_success 'clone --local detects missing objects' ' test_must_fail git clone --local missing missing-checkout ' -test_expect_failure 'clone --local detects misnamed objects' ' - test_must_fail git clone --local misnamed misnamed-checkout +test_expect_todo 'clone --local detects misnamed objects' ' + git clone --local misnamed misnamed-checkout ' test_expect_success 'fetch into corrupted repo with index-pack' ' diff --git a/t/t7815-grep-binary.sh b/t/t7815-grep-binary.sh index ac871287c03..2d17e05036e 100755 --- a/t/t7815-grep-binary.sh +++ b/t/t7815-grep-binary.sh @@ -64,8 +64,8 @@ test_expect_success 'git grep ile a' ' git grep ile a ' -test_expect_failure 'git grep .fi a' ' - git grep .fi a +test_expect_todo 'git grep .fi a' ' + test_must_fail git grep .fi a ' test_expect_success 'grep respects binary diff attribute' ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 0f439c99d61..a219b126d93 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -718,11 +718,13 @@ test_verify_prereq () { BUG "'$test_prereq' does not look like a prereq" } -test_expect_failure () { +_test_expect_todo () { + local func=$1 + shift test_start_ test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= test "$#" = 2 || - BUG "not 2 or 3 parameters to test-expect-failure" + BUG "not 2 or 3 parameters to $func" test_verify_prereq export test_prereq if ! test_skip "$@" @@ -730,14 +732,32 @@ test_expect_failure () { say >&3 "checking known breakage of $TEST_NUMBER.$test_count '$1': $2" if test_run_ "$2" expecting_failure then - test_known_broken_ok_ "$1" + case "$func" in + test_expect_todo) test_known_broken_ok_ "$func" "$1" ;; + test_expect_failure) test_known_broken_ok_ "$func" "$1" ;; + esac else - test_known_broken_failure_ "$1" + case "$func" in + test_expect_todo) + test_title_=$1 + shift + test_failure_ "$test_title_ (broken 'test_expect_todo'!)" "$@" + ;; + test_expect_failure) test_known_broken_failure_ "$func" "$1" ;; + esac fi fi test_finish_ } +test_expect_failure () { + _test_expect_todo test_expect_failure "$@" +} + +test_expect_todo () { + _test_expect_todo test_expect_todo "$@" +} + test_expect_success () { test_start_ test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= diff --git a/t/test-lib.sh b/t/test-lib.sh index 9af5fb7674d..fb23a6f6682 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -792,21 +792,45 @@ test_failure_ () { } test_known_broken_ok_ () { + local func=$1 + shift + if test -n "$write_junit_xml" then write_junit_xml_testcase "$* (breakage fixed)" fi - test_fixed=$(($test_fixed+1)) - say_color error "ok $test_count - $@ # TODO known breakage vanished" + + if test "$func" = "test_expect_todo" + then + # test_expect_todo + test_broken=$(($test_broken+1)) + say_color warn "not ok $test_count - $@ # TODO known breakage" + else + # test_expect_failure + test_fixed=$(($test_fixed+1)) + say_color error "ok $test_count - $@ # TODO known breakage vanished" + fi } test_known_broken_failure_ () { + local func=$1 + shift + if test -n "$write_junit_xml" then write_junit_xml_testcase "$* (known breakage)" fi - test_broken=$(($test_broken+1)) - say_color warn "not ok $test_count - $@ # TODO known breakage" + + if test "$func" = "test_expect_todo" + then + # test_expect_todo + test_fixed=$(($test_fixed+1)) + say_color error "not ok $test_count - $@ # TODO a 'known breakage' changed behavior!" + else + # test_expect_failure + test_broken=$(($test_broken+1)) + say_color warn "not ok $test_count - $@ # TODO known breakage" + fi } test_debug () {
Add an alternative to the "test_expect_failure" function. Like "test_expect_failure" it will marks a test as "not ok ... TODO" in the TAP output, and thus document that it's a "TODO" test that fails currently. Unlike "test_expect_failure" it asserts that the tested code in exactly one manner, and not any other. We'll thus stop conflating e.g. segfaults with other sorts of errors, and generally be able to tell if our "expected to fail" tests start failing in new and unexpected ways. In more detail, the these problems of "test_expect_failure" are solved by this new function. * When "test_expect_failure" is used in combination with "test_{must,might}_fail" it will hide segfaults or abort() failures, such as failures due to LSAN. This is because we do do the right thing in those helpers, but they themselves are expected to be used within "test_expect_success" and return 1. The "test_expect_failure" can't differentiate this from a "real" failure. We could change "test_{must,might}_fail" to appropriately carry forward the status code to work around this specific issue, but doing so would be a large semantic change in the current test suite. * More generally, "test_expect_failure" by design doesn't test what does, but just asserts that the test fails in some way. For some tests that truly fail in unpredictable ways this behavior makes sense, but it's almost never what our tests want. We know we e.g. have a revision at HEAD~ instead of HEAD, and would like to know conflate that with a potential segfault or other behavior change. In summary, this new function behaves exactly like "test_expect_success", except that its "success" is then reported as a "not ok .. TODO". Let's convert a few "test_expect_failure" uses to the new "test_expect_todo". For previous discussion on this topic see [1] and [2], in particular the points by Junio that it's desired that the "test_expect_failure" output differentiate the TODO tests from "test_expect_success". 1. https://lore.kernel.org/git/87tuhmk19c.fsf@evledraar.gmail.com/ 2. https://lore.kernel.org/git/xmqqo824e145.fsf@gitster.g/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/README | 20 +++++++++++++++++++- t/t0000-basic.sh | 11 ++++++++--- t/t1060-object-corruption.sh | 4 ++-- t/t7815-grep-binary.sh | 4 ++-- t/test-lib-functions.sh | 28 ++++++++++++++++++++++++---- t/test-lib.sh | 32 ++++++++++++++++++++++++++++---- 6 files changed, 83 insertions(+), 16 deletions(-)