diff mbox series

[1/7] test-lib: add a "test_expect_todo", similar to "test_expect_failure"

Message ID patch-1.7-4624abc2591-20220318T002951Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series test-lib-functions: a better "test_expect_failure" | expand

Commit Message

Ævar Arnfjörð Bjarmason March 18, 2022, 12:33 a.m. UTC
Add an alternative to the "test_expect_failure" function. Like
"test_expect_failure" it will marks a test as "not ok ... TODO" in the
TAP output, and thus document that it's a "TODO" test that fails
currently.

Unlike "test_expect_failure" it asserts that the tested code in
exactly one manner, and not any other. We'll thus stop conflating
e.g. segfaults with other sorts of errors, and generally be able to
tell if our "expected to fail" tests start failing in new and
unexpected ways.

In more detail, the these problems of "test_expect_failure" are solved
by this new function.

 * When "test_expect_failure" is used in combination with
   "test_{must,might}_fail" it will hide segfaults or abort()
   failures, such as failures due to LSAN.

   This is because we do do the right thing in those helpers, but they
   themselves are expected to be used within "test_expect_success" and
   return 1. The "test_expect_failure" can't differentiate this from a
   "real" failure.

   We could change "test_{must,might}_fail" to appropriately carry
   forward the status code to work around this specific issue, but
   doing so would be a large semantic change in the current test
   suite.

 * More generally, "test_expect_failure" by design doesn't test what
   does, but just asserts that the test fails in some way.

   For some tests that truly fail in unpredictable ways this behavior
   makes sense, but it's almost never what our tests want. We know we
   e.g. have a revision at HEAD~ instead of HEAD, and would like to
   know conflate that with a potential segfault or other behavior
   change.

In summary, this new function behaves exactly like
"test_expect_success", except that its "success" is then reported as a
"not ok .. TODO".

Let's convert a few "test_expect_failure" uses to the new
"test_expect_todo".

For previous discussion on this topic see [1] and [2], in particular
the points by Junio that it's desired that the "test_expect_failure"
output differentiate the TODO tests from "test_expect_success".

1. https://lore.kernel.org/git/87tuhmk19c.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/xmqqo824e145.fsf@gitster.g/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README                     | 20 +++++++++++++++++++-
 t/t0000-basic.sh             | 11 ++++++++---
 t/t1060-object-corruption.sh |  4 ++--
 t/t7815-grep-binary.sh       |  4 ++--
 t/test-lib-functions.sh      | 28 ++++++++++++++++++++++++----
 t/test-lib.sh                | 32 ++++++++++++++++++++++++++++----
 6 files changed, 83 insertions(+), 16 deletions(-)

Comments

Junio C Hamano March 18, 2022, 6:59 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add an alternative to the "test_expect_failure" function. Like
> "test_expect_failure" it will marks a test as "not ok ... TODO" in the

"will marks" -> "marks".

> TAP output, and thus document that it's a "TODO" test that fails
> currently.
>
> Unlike "test_expect_failure" it asserts that the tested code in
> exactly one manner, and not any other.

ECANNOTPARSE due to a verb missing.  For now I'll assume "It asserts
that the tested code fails in exactly one matter" and keep going.

> We'll thus stop conflating
> e.g. segfaults with other sorts of errors, and generally be able to
> tell if our "expected to fail" tests start failing in new and
> unexpected ways.

The above makes it sound wonderful but I got somewhat confused when
I tried to see how well it conveys what it wants to tell by picking
an example from this patch.  Say, for example:

> -test_expect_failure 'git grep .fi a' '
> -	git grep .fi a
> +test_expect_todo 'git grep .fi a' '
> +	test_must_fail git grep .fi a
>  '

So, we used to say

    "We eventually, when things are fixed, want 'git grep' to find a
    string '.fi' in file 'a', but currently this does not work".

Now the updated one says

    "We know 'git grep' fails to find string '.fi' in file 'a'"

If it is a trivial single statement test like the above, the
distinction does not matter, but if it were a test with multiple
steps, the readability would become quite different.  It would turn

	test_expect_failure 'sample test' '
		preparation 1 &&
		preparation 2 &&
		git command invocation that we want to succeed
	'

into

	test_expect_todo 'sample test' '
		preparation 1 &&
		preparation 2 &&
		test_must_fail git command invocation that we want to succeed
	'

"expect_failure" expects the whole thing to fail, so we will miss if
any preparation steps fail, but "expect_todo" is like "expect_success"
so it will help us debuging the test by catching a silly mistake in
preparation steps.

Marking the step that demonstrates the current shortcomings with
"MUST FAIL" is a bad taste, but let's pretend that we didn't notice
it ;-).  Other than that, it looks like an improvement.

>  * More generally, "test_expect_failure" by design doesn't test what
>    does, but just asserts that the test fails in some way.

Exactly.  

It matters in complex tests that !(A&B&C) is different from
(A&B&!C), the latter of which is what we want to do with tests that
document what currently does not work.

>   - test_expect_failure [<prereq>] <message> <script>
>  
>     This is NOT the opposite of test_expect_success, but is used
> -   to mark a test that demonstrates a known breakage.  Unlike
> +   to mark a test that demonstrates a known breakage whose exact failure
> +   behavior isn't predictable.
> +
> +   If the exact breakage is known the "test_expect_todo" function
> +   should be used instead. Usethis function if it's hard to pin down
> +   the exact nature of the failure. Unlike

"Usethis" -> "Use this"

> -test_expect_failure 'clone --local detects misnamed objects' '
> -	test_must_fail git clone --local misnamed misnamed-checkout
> +test_expect_todo 'clone --local detects misnamed objects' '
> +	git clone --local misnamed misnamed-checkout
>  '

Just like too many negatives confuse readers, I have to say this is
quite confusing.  Do we or do we not want "git clone" invocation to
succeed?

> +test_expect_failure () {
> +	_test_expect_todo test_expect_failure "$@"
> +}
> +
> +test_expect_todo () {
> +	_test_expect_todo test_expect_todo "$@"
> +}

It is a bit surprising that _test_expect_todo does not share more of
its implementation with test_expect_success, but let's pretend we
didn't see it.

Looks like a good first step.  I'd stop reading the series for now
at this one.
Junio C Hamano March 18, 2022, 8:50 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Marking the step that demonstrates the current shortcomings with
> "MUST FAIL" is a bad taste, but let's pretend that we didn't notice
> it ;-).  Other than that, it looks like an improvement.
> ...
>> +test_expect_failure () {
>> +	_test_expect_todo test_expect_failure "$@"
>> +}
>> +
>> +test_expect_todo () {
>> +	_test_expect_todo test_expect_todo "$@"
>> +}
>
> It is a bit surprising that _test_expect_todo does not share more of
> its implementation with test_expect_success, but let's pretend we
> didn't see it.

With a bit more tweak, I think this can be made palatable:

 * introduce something that is *NOT* test_must_fail but behaves like
   so, but with a bit of magic (see below).

 * do not introduce test_expect_todo, but use an improved version of
   test_expect_success.

Let's illustrate what I mean by starting from an example that we
want to have _after_ fixing an known breakage.  Let's say we expect
that our test preparation succeeds, 'git foo' fails when given a bad
option, 'git foo' runs successfully, and the command is expected to
emit certain output.  We may assert the ideal future world like so:

	test_expect_success 'make sure foo works the way we want' '
		preparatory step &&
		test_must_fail git foo --bad-option >error &&
		grep "expected error message" error &&
		! grep "unwanted error message" error &&
		git foo >output &&
		grep expected output &&
		! grep unwanted output
	'

Let's also imagine that right now, option parsing in "git foo",
works but the main execution of the command does not work.

With test_expect_todo, you have to write something like this
to document the current breakage:

	test_expect_todo 'document breakage' '
		preparatory step &&
		test_must_fail git foo --bad-option >error &&
		grep "expected error message" error &&
		! grep "unwanted error message" error &&
		test_must_fail git foo >output &&
		! grep expected output &&
		grep unwanted output
	'

You can see that this makes one thing unclear.

Among the two test_must_fail and two !, which one(s) document the
breakage?  In other words, which one of these four negations do we
wish to lift eventually?  The answer is the latter two (we said that
handling of "--bad-option" is already working), but it is not obvious
in the above test_expect_todo test sequence.

I'd suggest we allow our test to be written this way:

	test_expect_success 'make sure foo works the way we want' '
		preparatory step &&
		test_must_fail git foo --bad-option >error &&
		grep "expected error message" error &&
		! grep "unwanted error message" error &&
	test_ki git foo >output &&
	test_ki grep expected output &&
	test_ki ! grep unwanted output
	'

and teach test_expect_success that an invocation of test_ki ("known
issue"---a better name that is NOT test_must_fail very much welcome)
means we hope this test someday passes without test_ki but not
today, i.e. what your test_expect_todo means, and we unfortunately
have to expect that the lines annotated with test_ki would "fail".

To readers, test_ki should appear as if an annotation to a single
command that highlights what we want to eventually be able to fix,
and when the issue around the line marked as test_ki is fixed, we
can signal the fact by removing it from the beginning of these lines
(that is why the last one is "test_ki !" and not "!  test_ki").

To the test framework and the shell that is running the test,
test_ki would be almost identical to test_must_fail, i.e. runs the
rest of the command, catch segv and report, but otherwise the
failure from that "rest of the command" execution is turned into
"success" that lets &&-chain to continue.  In addition, it tells
the test_expect_success running it that a success of the test piece
should be shown as TODO (expected failure).

Hmm?
Ævar Arnfjörð Bjarmason March 18, 2022, 11:07 p.m. UTC | #3
On Fri, Mar 18 2022, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Marking the step that demonstrates the current shortcomings with
>> "MUST FAIL" is a bad taste, but let's pretend that we didn't notice
>> it ;-).  Other than that, it looks like an improvement.
>> ...
>>> +test_expect_failure () {
>>> +	_test_expect_todo test_expect_failure "$@"
>>> +}
>>> +
>>> +test_expect_todo () {
>>> +	_test_expect_todo test_expect_todo "$@"
>>> +}
>>
>> It is a bit surprising that _test_expect_todo does not share more of
>> its implementation with test_expect_success, but let's pretend we
>> didn't see it.
>
> With a bit more tweak, I think this can be made palatable:
>
>  * introduce something that is *NOT* test_must_fail but behaves like
>    so, but with a bit of magic (see below).
>
>  * do not introduce test_expect_todo, but use an improved version of
>    test_expect_success.
>
> Let's illustrate what I mean by starting from an example that we
> want to have _after_ fixing an known breakage.  Let's say we expect
> that our test preparation succeeds, 'git foo' fails when given a bad
> option, 'git foo' runs successfully, and the command is expected to
> emit certain output.  We may assert the ideal future world like so:
>
> 	test_expect_success 'make sure foo works the way we want' '
> 		preparatory step &&
> 		test_must_fail git foo --bad-option >error &&
> 		grep "expected error message" error &&
> 		! grep "unwanted error message" error &&
> 		git foo >output &&
> 		grep expected output &&
> 		! grep unwanted output
> 	'
>
> Let's also imagine that right now, option parsing in "git foo",
> works but the main execution of the command does not work.
>
> With test_expect_todo, you have to write something like this
> to document the current breakage:
>
> 	test_expect_todo 'document breakage' '
> 		preparatory step &&
> 		test_must_fail git foo --bad-option >error &&
> 		grep "expected error message" error &&
> 		! grep "unwanted error message" error &&
> 		test_must_fail git foo >output &&
> 		! grep expected output &&
> 		grep unwanted output
> 	'
>
> You can see that this makes one thing unclear.
>
> Among the two test_must_fail and two !, which one(s) document the
> breakage?  In other words, which one of these four negations do we
> wish to lift eventually?  The answer is the latter two (we said that
> handling of "--bad-option" is already working), but it is not obvious
> in the above test_expect_todo test sequence.
>
> I'd suggest we allow our test to be written this way:
>
> 	test_expect_success 'make sure foo works the way we want' '
> 		preparatory step &&
> 		test_must_fail git foo --bad-option >error &&
> 		grep "expected error message" error &&
> 		! grep "unwanted error message" error &&
> 	test_ki git foo >output &&
> 	test_ki grep expected output &&
> 	test_ki ! grep unwanted output
> 	'
>
> and teach test_expect_success that an invocation of test_ki ("known
> issue"---a better name that is NOT test_must_fail very much welcome)
> means we hope this test someday passes without test_ki but not
> today, i.e. what your test_expect_todo means, and we unfortunately
> have to expect that the lines annotated with test_ki would "fail".
>
> To readers, test_ki should appear as if an annotation to a single
> command that highlights what we want to eventually be able to fix,
> and when the issue around the line marked as test_ki is fixed, we
> can signal the fact by removing it from the beginning of these lines
> (that is why the last one is "test_ki !" and not "!  test_ki").
>
> To the test framework and the shell that is running the test,
> test_ki would be almost identical to test_must_fail, i.e. runs the
> rest of the command, catch segv and report, but otherwise the
> failure from that "rest of the command" execution is turned into
> "success" that lets &&-chain to continue.  In addition, it tells
> the test_expect_success running it that a success of the test piece
> should be shown as TODO (expected failure).
>
> Hmm?

Have you had the time to look past patch 1/7 of this series? 2/7
introduces a "test_todo" helper, the "test_expect_todo" is just the
basic top-level primitive.

I don't think we can categorically replace all of the
"test_expect_failure" cases, because in some of those it's too much of a
hassle to assert the exact current behavior, or we don't really care.

But I think for most cases instead of a:

	test_ki ! grep unwanted output

It makes more sense to do (as that helper does):

	test_todo grep --want yay --expect unwanted -- output

Which is quite a handful, which is why the series goes on to
e.g. introduce a todo_test_cmp, used e.g. like this (and we can easily
add more wrappers for common cases):

	cat >want <<-EOF &&
	$(git rev-parse HEAD)
	EOF
	cat >expect <<-\EOF &&
	error: can't rev-parse stuff
	EOF
	test_might_fail git some-cmd HEAD >actual &&
	todo_test_cmp want expect actual

I.e. if you just want to throw your hands in the air and say "I don't
care how this fails and just emit a 'not ok .. TODO' line" we already
have test_expect_failure for that use-case.

I think in most other cases documenting that something is broken or
should behave differently shouldn't be synonymous with not caring *how*
that unwanted behavior works right now.

Understanding of your past commentary on this topic is that you strongly
objected to not having the test suite output reflect that a given test
was "not ok ... TODO" in some way. I.e. even though I think
"test_expect_success" has the approximate *semantics* we want we
shouldn't use those.

But I think the combination of "test_expect_todo" and the "test_todo"
primitive should satisfy that, and will give us accurate assertions
about what we're actually doing now.
Junio C Hamano March 19, 2022, 7:12 a.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> emit certain output.  We may assert the ideal future world like so:
>>
>> 	test_expect_success 'make sure foo works the way we want' '
>> 		preparatory step &&
>> 		test_must_fail git foo --bad-option >error &&
>> 		grep "expected error message" error &&
>> 		! grep "unwanted error message" error &&
>> 		git foo >output &&
>> 		grep expected output &&
>> 		! grep unwanted output
>> 	'
>>
>> Let's also imagine that right now, option parsing in "git foo",
>> works but the main execution of the command does not work.
>>
>> With test_expect_todo, you have to write something like this
>> to document the current breakage:
>>
>> 	test_expect_todo 'document breakage' '
>> 		preparatory step &&
>> 		test_must_fail git foo --bad-option >error &&
>> 		grep "expected error message" error &&
>> 		! grep "unwanted error message" error &&
>> 		test_must_fail git foo >output &&
>> 		! grep expected output &&
>> 		grep unwanted output
>> 	'
>>
>> You can see that this makes one thing unclear.
>>
>> Among the two test_must_fail and two !, which one(s) document the
>> breakage?  In other words, which one of these four negations do we
>> wish to lift eventually?  The answer is the latter two (we said that
>> handling of "--bad-option" is already working), but it is not obvious
>> in the above test_expect_todo test sequence.
>>
>> I'd suggest we allow our test to be written this way:
>>
>> 	test_expect_success 'make sure foo works the way we want' '
>> 		preparatory step &&
>> 		test_must_fail git foo --bad-option >error &&
>> 		grep "expected error message" error &&
>> 		! grep "unwanted error message" error &&
>> 	test_ki git foo >output &&
>> 	test_ki grep expected output &&
>> 	test_ki ! grep unwanted output
>> 	'
>>
>> and teach test_expect_success that an invocation of test_ki ("known
>> issue"---a better name that is NOT test_must_fail very much welcome)
>> means we hope this test someday passes without test_ki but not
>> today, i.e. what your test_expect_todo means, and we unfortunately
>> have to expect that the lines annotated with test_ki would "fail".

> Have you had the time to look past patch 1/7 of this series? 2/7
> introduces a "test_todo" helper, the "test_expect_todo" is just the
> basic top-level primitive.

No, and I do not have to.  I care about the most basic form first,
and if you cannot get it right, it is not interesting.  You can
consider the test_ki above as the primitive form of your "test_todo"
that says "I want the command to give true, but I know it currently
gives false".

And quite honestly, I am not interested in _how_ it currently
happens to break.  We may want the command being tested to
eventually count three commits, but due to a bug, it may only count
one.  You may say "test_todo count --want 3 --expect 1 blah", but
the "expect" part is much less interesting than the fact that the
command being tested on _that_ line (not the whole sequence run with
test_expect_failure) is clearly documented to want 3 but currently
is broken, and it can be clearly distinguished from the normal
test_must_fail or ! that documents that we do want a failure out of
the command being tested there.

So with or without the "higher level" wrappers, how else, other than
the way I showed in the message you are responding to as a rewrite
of the example to use test_expect_todo, that uses two test_must_fail
and two ! and makes which ones are expected failure and which ones
are documentation of the current breakage, do you propose to write
the equivalent?  It may be that your test_todo may be a different
way to spell the test_ki marker I showed above, and if that is the
case it is perfectly fine, but I want it to be THE primitive, not
test_must_fail or !, to mark a single command in the test that
currently does not work as expected.

> I don't think we can categorically replace all of the
> "test_expect_failure" cases, because in some of those it's too much of a
> hassle to assert the exact current behavior, or we don't really care.
>
> But I think for most cases instead of a:
>
> 	test_ki ! grep unwanted output
>
> It makes more sense to do (as that helper does):
>
> 	test_todo grep --want yay --expect unwanted -- output

My take is the complete opposite.  We can and should start small,
and how exactly the current implementation happens to be broken does
not matter most of the time.
Ævar Arnfjörð Bjarmason March 19, 2022, 11:11 a.m. UTC | #5
On Sat, Mar 19 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> emit certain output.  We may assert the ideal future world like so:
>>>
>>> 	test_expect_success 'make sure foo works the way we want' '
>>> 		preparatory step &&
>>> 		test_must_fail git foo --bad-option >error &&
>>> 		grep "expected error message" error &&
>>> 		! grep "unwanted error message" error &&
>>> 		git foo >output &&
>>> 		grep expected output &&
>>> 		! grep unwanted output
>>> 	'
>>>
>>> Let's also imagine that right now, option parsing in "git foo",
>>> works but the main execution of the command does not work.
>>>
>>> With test_expect_todo, you have to write something like this
>>> to document the current breakage:
>>>
>>> 	test_expect_todo 'document breakage' '
>>> 		preparatory step &&
>>> 		test_must_fail git foo --bad-option >error &&
>>> 		grep "expected error message" error &&
>>> 		! grep "unwanted error message" error &&
>>> 		test_must_fail git foo >output &&
>>> 		! grep expected output &&
>>> 		grep unwanted output
>>> 	'
>>>
>>> You can see that this makes one thing unclear.
>>>
>>> Among the two test_must_fail and two !, which one(s) document the
>>> breakage?  In other words, which one of these four negations do we
>>> wish to lift eventually?  The answer is the latter two (we said that
>>> handling of "--bad-option" is already working), but it is not obvious
>>> in the above test_expect_todo test sequence.
>>>
>>> I'd suggest we allow our test to be written this way:
>>>
>>> 	test_expect_success 'make sure foo works the way we want' '
>>> 		preparatory step &&
>>> 		test_must_fail git foo --bad-option >error &&
>>> 		grep "expected error message" error &&
>>> 		! grep "unwanted error message" error &&
>>> 	test_ki git foo >output &&
>>> 	test_ki grep expected output &&
>>> 	test_ki ! grep unwanted output
>>> 	'
>>>
>>> and teach test_expect_success that an invocation of test_ki ("known
>>> issue"---a better name that is NOT test_must_fail very much welcome)
>>> means we hope this test someday passes without test_ki but not
>>> today, i.e. what your test_expect_todo means, and we unfortunately
>>> have to expect that the lines annotated with test_ki would "fail".
>
>> Have you had the time to look past patch 1/7 of this series? 2/7
>> introduces a "test_todo" helper, the "test_expect_todo" is just the
>> basic top-level primitive.
>
> No, and I do not have to.  I care about the most basic form first,
> and if you cannot get it right, it is not interesting.  You can
> consider the test_ki above as the primitive form of your "test_todo"
> that says "I want the command to give true, but I know it currently
> gives false".

Sure, and I do have that implemented. If you're just asking that my
"test_todo" or another helper do that by default, then that's easy.

I.e. I've got that, but not as one short "test_*" verb.

> And quite honestly, I am not interested in _how_ it currently
> happens to break.  We may want the command being tested to
> eventually count three commits, but due to a bug, it may only count
> one.  You may say "test_todo count --want 3 --expect 1 blah", but
> the "expect" part is much less interesting than the fact that the
> command being tested on _that_ line (not the whole sequence run with
> test_expect_failure) is clearly documented to want 3 but currently
> is broken, and it can be clearly distinguished from the normal
> test_must_fail or ! that documents that we do want a failure out of
> the command being tested there.

Yes, if you don't want to test the exact behavior you have/want that's
also easy.

> So with or without the "higher level" wrappers, how else, other than
> the way I showed in the message you are responding to as a rewrite
> of the example to use test_expect_todo, that uses two test_must_fail
> and two ! and makes which ones are expected failure and which ones
> are documentation of the current breakage, do you propose to write
> the equivalent?  It may be that your test_todo may be a different
> way to spell the test_ki marker I showed above, and if that is the
> case it is perfectly fine, but I want it to be THE primitive, not
> test_must_fail or !, to mark a single command in the test that
> currently does not work as expected.

Sure, yes it's basically a different way to spell the same thing....

>> I don't think we can categorically replace all of the
>> "test_expect_failure" cases, because in some of those it's too much of a
>> hassle to assert the exact current behavior, or we don't really care.
>>
>> But I think for most cases instead of a:
>>
>> 	test_ki ! grep unwanted output
>>
>> It makes more sense to do (as that helper does):
>>
>> 	test_todo grep --want yay --expect unwanted -- output
>
> My take is the complete opposite.  We can and should start small,
> and how exactly the current implementation happens to be broken does
> not matter most of the time.

Well, the tip of this series leaves ~20 uses of test_expect_todo v.s. a
remaining ~100 uses of test_expect_failure, so it is a small start. I
converted those things I thought made the most sense.

But I do think you want to test at least a fuzzy "how exactly" most of
the time. The reason I worked on this was because while authoring the
series you merged in ea05fd5fbf7 (Merge branch
'ab/keep-git-exit-codes-in-tests', 2022-03-16) I found that we had
various test_expect_failure that failed in ways very different than what
we'd expect.

Or, saying that something exits non-zero and we'd like to fix it isn't
the same as saying that we'd like to e.g. exclude it from SANITIZE=leak
or SANITIZE=address testing. I.e. it still shouldn't leak, double-free()
or run into a BUG(), and if it does we'd like to know most of the time.

I think the only sensible thing to do to fix that is to have the
semantics of test_expect_todo, within that you can always decide to
ignore individual exit codes, but you can't really do it the other way
around (which is what test_expect_failure does).
Phillip Wood March 20, 2022, 3:13 p.m. UTC | #6
Hi Ævar

On 19/03/2022 11:11, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Mar 19 2022, Junio C Hamano wrote:
>[...] 
>>> I don't think we can categorically replace all of the
>>> "test_expect_failure" cases, because in some of those it's too much of a
>>> hassle to assert the exact current behavior, or we don't really care.
>>>
>>> But I think for most cases instead of a:
>>>
>>> 	test_ki ! grep unwanted output
>>>
>>> It makes more sense to do (as that helper does):
>>>
>>> 	test_todo grep --want yay --expect unwanted -- output
>>
>> My take is the complete opposite.  We can and should start small,
>> and how exactly the current implementation happens to be broken does
>> not matter most of the time.
> 
> Well, the tip of this series leaves ~20 uses of test_expect_todo v.s. a
> remaining ~100 uses of test_expect_failure, so it is a small start. I
> converted those things I thought made the most sense.
> 
> But I do think you want to test at least a fuzzy "how exactly" most of
> the time. The reason I worked on this was because while authoring the
> series you merged in ea05fd5fbf7 (Merge branch
> 'ab/keep-git-exit-codes-in-tests', 2022-03-16) I found that we had
> various test_expect_failure that failed in ways very different than what
> we'd expect.
> 
> Or, saying that something exits non-zero and we'd like to fix it isn't
> the same as saying that we'd like to e.g. exclude it from SANITIZE=leak
> or SANITIZE=address testing. I.e. it still shouldn't leak, double-free()
> or run into a BUG(), and if it does we'd like to know most of the time.

I think having test_todo based on test_must_fail as Junio
suggested avoids this as it means the test will still fail on SIGSEV
or SIGABRT (if we don't already do so we can make LSAN exit with the
same code as vargrind which test_must_fail checks for). I had a look
at some of the conversions with your test_todo --want/--expect/--reset
and found the result really hard to follow. Junio's suggestions chimed
with some things I've been thinking about so I had a go at
implementing it and doing some sample conversions (see below). Marking
the individual commands that should fail is a big step forward and the
failing commands are checked to make sure they don't segfault etc.

Best Wishes

Phillip

---- >8 -----
  t/t0000-basic.sh                |   9 +++++--
  t/t3401-rebase-and-am-rename.sh |  12 +++++-----
  t/t3424-rebase-empty.sh         |   6 ++---
  t/t4014-format-patch.sh         |  20 ++++++++--------
  t/test-lib-functions.sh         | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------
  5 files changed, 116 insertions(+), 61 deletions(-)

diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index b007f0efef..53217d4cd5 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -103,16 +103,21 @@ test_expect_success 'subtest: 2/3 tests passing' '
  
  test_expect_success 'subtest: a failing TODO test' '
  	write_and_run_sub_test_lib_test failing-todo <<-\EOF &&
+	test_false () {
+		false
+	}
  	test_expect_success "passing test" "true"
  	test_expect_failure "pretend we have a known breakage" "false"
+	test_expect_success "known todo" "test_todo test_false"
  	test_done
  	EOF
  	check_sub_test_lib_test failing-todo <<-\EOF
  	> ok 1 - passing test
  	> not ok 2 - pretend we have a known breakage # TODO known breakage
-	> # still have 1 known breakage(s)
+	> not ok 3 - known todo # TODO known breakage
+	> # still have 2 known breakage(s)
  	> # passed all remaining 1 test(s)
-	> 1..2
+	> 1..3
  	EOF
  '
  
diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh
index f18bae9450..cc5da9f5af 100755
--- a/t/t3401-rebase-and-am-rename.sh
+++ b/t/t3401-rebase-and-am-rename.sh
@@ -52,7 +52,7 @@ test_expect_success 'rebase --interactive: directory rename detected' '
  	)
  '
  
-test_expect_failure 'rebase --apply: directory rename detected' '
+test_expect_success 'rebase --apply: directory rename detected' '
  	(
  		cd dir-rename &&
  
@@ -63,8 +63,8 @@ test_expect_failure 'rebase --apply: directory rename detected' '
  		git ls-files -s >out &&
  		test_line_count = 5 out &&
  
-		test_path_is_file y/d &&
-		test_path_is_missing x/d
+		test_todo test_path_is_file y/d &&
+		test_todo test_path_is_missing x/d
  	)
  '
  
@@ -84,7 +84,7 @@ test_expect_success 'rebase --merge: directory rename detected' '
  	)
  '
  
-test_expect_failure 'am: directory rename detected' '
+test_expect_success 'am: directory rename detected' '
  	(
  		cd dir-rename &&
  
@@ -97,8 +97,8 @@ test_expect_failure 'am: directory rename detected' '
  		git ls-files -s >out &&
  		test_line_count = 5 out &&
  
-		test_path_is_file y/d &&
-		test_path_is_missing x/d
+		test_todo test_path_is_file y/d &&
+		test_todo test_path_is_missing x/d
  	)
  '
  
diff --git a/t/t3424-rebase-empty.sh b/t/t3424-rebase-empty.sh
index 5e1045a0af..b7cae26075 100755
--- a/t/t3424-rebase-empty.sh
+++ b/t/t3424-rebase-empty.sh
@@ -34,15 +34,15 @@ test_expect_success 'setup test repository' '
  	git commit -m "Five letters ought to be enough for anybody"
  '
  
-test_expect_failure 'rebase (apply-backend)' '
-	test_when_finished "git rebase --abort" &&
+test_expect_success 'rebase (apply-backend)' '
+	test_when_finished "test_might_fail git rebase --abort" &&
  	git checkout -B testing localmods &&
  	# rebase (--apply) should not drop commits that start empty
  	git rebase --apply upstream &&
  
  	test_write_lines D C B A >expect &&
  	git log --format=%s >actual &&
-	test_cmp expect actual
+	test_todo test_cmp expect actual
  '
  
  test_expect_success 'rebase --merge --empty=drop' '
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 7dc5a5c736..bb0fcef40e 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -165,12 +165,12 @@ test_expect_success 'additional command line cc (ascii)' '
  	grep "^ *S E Cipient <scipient@example.com>\$" hdrs5
  '
  
-test_expect_failure 'additional command line cc (rfc822)' '
+test_expect_success 'additional command line cc (rfc822)' '
  	git config --replace-all format.headers "Cc: R E Cipient <rcipient@example.com>" &&
  	git format-patch --cc="S. E. Cipient <scipient@example.com>" --stdout main..side >patch5 &&
  	sed -e "/^\$/q" patch5 >hdrs5 &&
  	grep "^Cc: R E Cipient <rcipient@example.com>,\$" hdrs5 &&
-	grep "^ *\"S. E. Cipient\" <scipient@example.com>\$" hdrs5
+	test_todo grep "^ *\"S. E. Cipient\" <scipient@example.com>\$" hdrs5
  '
  
  test_expect_success 'command line headers' '
@@ -195,16 +195,16 @@ test_expect_success 'command line To: header (ascii)' '
  	grep "^To: R E Cipient <rcipient@example.com>\$" hdrs8
  '
  
-test_expect_failure 'command line To: header (rfc822)' '
+test_expect_success 'command line To: header (rfc822)' '
  	git format-patch --to="R. E. Cipient <rcipient@example.com>" --stdout main..side >patch8 &&
  	sed -e "/^\$/q" patch8 >hdrs8 &&
-	grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs8
+	test_todo grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs8
  '
  
-test_expect_failure 'command line To: header (rfc2047)' '
+test_expect_success 'command line To: header (rfc2047)' '
  	git format-patch --to="R Ä Cipient <rcipient@example.com>" --stdout main..side >patch8 &&
  	sed -e "/^\$/q" patch8 >hdrs8 &&
-	grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs8
+	test_todo grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs8
  '
  
  test_expect_success 'configuration To: header (ascii)' '
@@ -214,18 +214,18 @@ test_expect_success 'configuration To: header (ascii)' '
  	grep "^To: R E Cipient <rcipient@example.com>\$" hdrs9
  '
  
-test_expect_failure 'configuration To: header (rfc822)' '
+test_expect_success 'configuration To: header (rfc822)' '
  	git config format.to "R. E. Cipient <rcipient@example.com>" &&
  	git format-patch --stdout main..side >patch9 &&
  	sed -e "/^\$/q" patch9 >hdrs9 &&
-	grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs9
+	test_todo grep "^To: \"R. E. Cipient\" <rcipient@example.com>\$" hdrs9
  '
  
-test_expect_failure 'configuration To: header (rfc2047)' '
+test_expect_success 'configuration To: header (rfc2047)' '
  	git config format.to "R Ä Cipient <rcipient@example.com>" &&
  	git format-patch --stdout main..side >patch9 &&
  	sed -e "/^\$/q" patch9 >hdrs9 &&
-	grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs9
+	test_todo grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= <rcipient@example.com>\$" hdrs9
  '
  
  # check_patch <patch>: Verify that <patch> looks like a half-sane
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 0f439c99d6..deb74a22f3 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -739,6 +739,7 @@ test_expect_failure () {
  }
  
  test_expect_success () {
+	test_todo_=
  	test_start_
  	test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
  	test "$#" = 2 ||
@@ -750,7 +751,12 @@ test_expect_success () {
  		say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $2"
  		if test_run_ "$2"
  		then
-			test_ok_ "$1"
+			if test -n "$test_todo_"
+			then
+				test_known_broken_failure_ "$1"
+			else
+				test_ok_ "$1"
+			fi
  		else
  			test_failure_ "$@"
  		fi
@@ -1011,8 +1017,20 @@ list_contains () {
  # Returns success if the arguments indicate that a command should be
  # accepted by test_must_fail(). If the command is run with env, the env
  # and its corresponding variable settings will be stripped before we
+# we test the command being run.
+#
+# For test_todo we allow a wider range of commands to and if the
+# command is run with verbose then verbose will be stripped before we
  # test the command being run.
+
  test_must_fail_acceptable () {
+	local name
+	name="$1"
+	shift
+	if test "$name" = "todo" && test "$1" = "verbose"
+	then
+		shift
+	fi
  	if test "$1" = "env"
  	then
  		shift
@@ -1033,44 +1051,22 @@ test_must_fail_acceptable () {
  	git|__git*|test-tool|test_terminal)
  		return 0
  		;;
+	grep|test|test_*)
+		if test "$name" = "todo"
+		then
+		    return 0
+		fi
+		return 1
+		;;
  	*)
  		return 1
  		;;
  	esac
  }
  
-# This is not among top-level (test_expect_success | test_expect_failure)
-# but is a prefix that can be used in the test script, like:
-#
-#	test_expect_success 'complain and die' '
-#           do something &&
-#           do something else &&
-#	    test_must_fail git checkout ../outerspace
-#	'
-#
-# Writing this as "! git checkout ../outerspace" is wrong, because
-# the failure could be due to a segv.  We want a controlled failure.
-#
-# Accepts the following options:
-#
-#   ok=<signal-name>[,<...>]:
-#     Don't treat an exit caused by the given signal as error.
-#     Multiple signals can be specified as a comma separated list.
-#     Currently recognized signal names are: sigpipe, success.
-#     (Don't use 'success', use 'test_might_fail' instead.)
-#
-# Do not use this to run anything but "git" and other specific testable
-# commands (see test_must_fail_acceptable()).  We are not in the
-# business of vetting system supplied commands -- in other words, this
-# is wrong:
-#
-#    test_must_fail grep pattern output
-#
-# Instead use '!':
-#
-#    ! grep pattern output
-
-test_must_fail () {
+test_must_fail_helper () {
+	test_must_fail_name_="$1"
+	shift
  	case "$1" in
  	ok=*)
  		_test_ok=${1#ok=}
@@ -1080,36 +1076,90 @@ test_must_fail () {
  		_test_ok=
  		;;
  	esac
-	if ! test_must_fail_acceptable "$@"
+	if ! test_must_fail_acceptable $test_must_fail_name_ "$@"
  	then
-		echo >&7 "test_must_fail: only 'git' is allowed: $*"
+		echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*"
  		return 1
  	fi
  	"$@" 2>&7
  	exit_code=$?
  	if test $exit_code -eq 0 && ! list_contains "$_test_ok" success
  	then
-		echo >&4 "test_must_fail: command succeeded: $*"
+		echo >&4 "test_$test_must_fail_name_: command succeeded: $*"
  		return 1
  	elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe
  	then
  		return 0
  	elif test $exit_code -gt 129 && test $exit_code -le 192
  	then
-		echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*"
+		echo >&4 "test_$test_must_fail_name_: died by signal $(($exit_code - 128)): $*"
  		return 1
  	elif test $exit_code -eq 127
  	then
-		echo >&4 "test_must_fail: command not found: $*"
+		echo >&4 "test_$test_must_fail_name_: command not found: $*"
  		return 1
  	elif test $exit_code -eq 126
  	then
-		echo >&4 "test_must_fail: valgrind error: $*"
+		echo >&4 "test_$test_must_fail_name_: valgrind error: $*"
  		return 1
  	fi
  	return 0
  } 7>&2 2>&4
  
+# This is used to mark commands that should succeed but do not due to
+# a known issue. Marking the individual commands that fail rather than
+# using test_expect_failure allows us to detect any unexpected
+# failures. As with test_must_fail if the command is killed by a
+# signal the test will fail. If the command unexpectedly succeeds then
+# the test fails. For example:
+#
+#	test_expect_success 'test a known failure' '
+#		git foo 2>err &&
+#		test_todo test_must_be_empty err
+#	'
+#
+# This test will fail if "git foo" fails or err is unexpectedly empty
+
+test_todo () {
+	test_todo_=todo
+	test_must_fail_helper todo "$@" 2>&7
+} 7>&2 2>&4
+
+# This is not among top-level (test_expect_success | test_expect_failure)
+# but is a prefix that can be used in the test script, like:
+#
+#	test_expect_success 'complain and die' '
+#           do something &&
+#           do something else &&
+#	    test_must_fail git checkout ../outerspace
+#	'
+#
+# Writing this as "! git checkout ../outerspace" is wrong, because
+# the failure could be due to a segv.  We want a controlled failure.
+#
+# Accepts the following options:
+#
+#   ok=<signal-name>[,<...>]:
+#     Don't treat an exit caused by the given signal as error.
+#     Multiple signals can be specified as a comma separated list.
+#     Currently recognized signal names are: sigpipe, success.
+#     (Don't use 'success', use 'test_might_fail' instead.)
+#
+# Do not use this to run anything but "git" and other specific testable
+# commands (see test_must_fail_acceptable()).  We are not in the
+# business of vetting system supplied commands -- in other words, this
+# is wrong:
+#
+#    test_must_fail grep pattern output
+#
+# Instead use '!':
+#
+#    ! grep pattern output
+
+test_must_fail () {
+	test_must_fail_helper must_fail "$@" 2>&7
+} 7>&2 2>&4
+
  # Similar to test_must_fail, but tolerates success, too.  This is
  # meant to be used in contexts like:
  #
@@ -1124,7 +1174,7 @@ test_must_fail () {
  # Accepts the same options as test_must_fail.
  
  test_might_fail () {
-	test_must_fail ok=success "$@" 2>&7
+	test_must_fail_helper might_fail ok=success "$@" 2>&7
  } 7>&2 2>&4
  
  # Similar to test_must_fail and test_might_fail, but check that a
Junio C Hamano March 20, 2022, 6:07 p.m. UTC | #7
Phillip Wood <phillip.wood123@gmail.com> writes:

> .... I had a look
> at some of the conversions with your test_todo --want/--expect/--reset
> and found the result really hard to follow. Junio's suggestions chimed
> with some things I've been thinking about so I had a go at
> implementing it and doing some sample conversions (see below). Marking
> the individual commands that should fail is a big step forward and the
> failing commands are checked to make sure they don't segfault etc.

;-)

Another small detail in my suggestion that will make a huge
difference in the end is not to introduce test_expect_todo as a
separate top-level construct, and instead teach test_expect_success
to pay attention to the use of test_todo "unfortunately this does
not work yet" mark in it.  It allows us to use test_todo in a shared
test helper function and call them in test_expect_success, and when
the step the test helper has trouble with gets fixed, the "unmark"
step will be an isolated change.

Your sample change seems to already have it, which is good.

> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 0f439c99d6..deb74a22f3 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -739,6 +739,7 @@ test_expect_failure () {
>  }
>    test_expect_success () {
> +	test_todo_=
>  	test_start_
>  	test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
>  	test "$#" = 2 ||
> @@ -750,7 +751,12 @@ test_expect_success () {
>  		say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $2"
>  		if test_run_ "$2"
>  		then
> -			test_ok_ "$1"
> +			if test -n "$test_todo_"
> +			then
> +				test_known_broken_failure_ "$1"
> +			else
> +				test_ok_ "$1"
> +			fi
>  		else
>  			test_failure_ "$@"
>  		fi
Derrick Stolee March 22, 2022, 2:43 p.m. UTC | #8
On 3/20/22 2:07 PM, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> .... I had a look
>> at some of the conversions with your test_todo --want/--expect/--reset
>> and found the result really hard to follow. Junio's suggestions chimed
>> with some things I've been thinking about so I had a go at
>> implementing it and doing some sample conversions (see below). Marking
>> the individual commands that should fail is a big step forward and the
>> failing commands are checked to make sure they don't segfault etc.
> 
> ;-)
> 
> Another small detail in my suggestion that will make a huge
> difference in the end is not to introduce test_expect_todo as a
> separate top-level construct, and instead teach test_expect_success
> to pay attention to the use of test_todo "unfortunately this does
> not work yet" mark in it.  It allows us to use test_todo in a shared
> test helper function and call them in test_expect_success, and when
> the step the test helper has trouble with gets fixed, the "unmark"
> step will be an isolated change.
> 
> Your sample change seems to already have it, which is good.

After reading this thread, I agree with many of the ideas that were
generated in response to this topic.

The thing I'm hoping to see from a final version is that a top-level
helper like test_expect_todo will expect at least one test_todo
helper to be executed inside of the test (perhaps communicated by
setting a special GIT_TEST_* environment variable?) and if any of
the test_todo lines change from fail to pass, then that is
communicated as a _failure_ from CI's perspective. Let us discover
if we have accidentally "fixed" any of these test cases and update
the tests accordingly.

I can predict writing a test case with multiple test_todo lines
that need to be updated to drop the test_todo helpers one-by-one
as a change is being introduced.

Thanks,
-Stolee
Junio C Hamano March 23, 2022, 10:13 p.m. UTC | #9
Derrick Stolee <derrickstolee@github.com> writes:

> The thing I'm hoping to see from a final version is that a top-level
> helper like test_expect_todo will expect at least one test_todo
> helper to be executed inside of the test (perhaps communicated by
> setting a special GIT_TEST_* environment variable?) and if any of

I was hoping that we can do without test_expect_todo.
test_expect_success can turn itself into test_expect_todo when it
sees test_todo is invoked even once in it.  And Phillip's outline
actually implements the idea, if I am not mistaken.

> the test_todo lines change from fail to pass, then that is
> communicated as a _failure_ from CI's perspective. Let us discover
> if we have accidentally "fixed" any of these test cases and update
> the tests accordingly.

In other words, we do not want to lose the "TODO fixed" we have been
getting out of test_expect_failure, which I agree with.  I am not
sure if Phillip's outline had that feature.

> I can predict writing a test case with multiple test_todo lines
> that need to be updated to drop the test_todo helpers one-by-one
> as a change is being introduced.

Yes.
Phillip Wood March 24, 2022, 11:24 a.m. UTC | #10
On 23/03/2022 22:13, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> The thing I'm hoping to see from a final version is that a top-level
>> helper like test_expect_todo will expect at least one test_todo
>> helper to be executed inside of the test (perhaps communicated by
>> setting a special GIT_TEST_* environment variable?) and if any of
> 
> I was hoping that we can do without test_expect_todo.
> test_expect_success can turn itself into test_expect_todo when it
> sees test_todo is invoked even once in it.  And Phillip's outline
> actually implements the idea, if I am not mistaken.

You are correct, there is no test_expect_todo in the outline I posted, 
test_todo lives within test_expect_success and works like test_must_fail.
>> the test_todo lines change from fail to pass, then that is
>> communicated as a _failure_ from CI's perspective. Let us discover
>> if we have accidentally "fixed" any of these test cases and update
>> the tests accordingly.
> 
> In other words, we do not want to lose the "TODO fixed" we have been
> getting out of test_expect_failure, which I agree with. 

I read Stolee's comments the other way, that we want the test harness to 
see the test fail rather passing as it does with the "TODO fixed" 
feature. In one of my early contributions I inadvertently fixed a 
submodule test and did not realize until someone pointed it out because 
the CI passes rather than fails when a test_expect_failure is fixed.

> I am not
> sure if Phillip's outline had that feature.

In my outline the test fails if any command marked by test_todo is 
successful and test_todo prints a messaging saying the command was 
unexpectedly successful. Implementing the "TODO fixed" feature gets 
tricky when there is more than one test_todo within a single 
test_expect_success - what if one is test_todo command is successful and 
the others fail?


Best Wishes

Phillip

>> I can predict writing a test case with multiple test_todo lines
>> that need to be updated to drop the test_todo helpers one-by-one
>> as a change is being introduced.
> 
> Yes.
diff mbox series

Patch

diff --git a/t/README b/t/README
index f48e0542cdc..e0aa8ebb0eb 100644
--- a/t/README
+++ b/t/README
@@ -805,10 +805,28 @@  see test-lib-functions.sh for the full list and their options.
 	test_expect_success PERL,PYTHON 'yo dawg' \
 	    ' test $(perl -E 'print eval "1 +" . qx[python -c "print 2"]') == "4" '
 
+ - test_expect_todo [<prereq>] <message> <script>
+
+   A "test_expect_success" which on "success" will mark the test as a
+   TODO test, and on failure will emit a real failure.
+
+   This should be used to mark a test whose exact bad behavior is
+   known, but whose outcome isn't preferred, to distinguish it from
+   those tests that use "test_expect_success" to indicate known and
+   preferred behavior.
+
+   Like test_expect_success this function can optionally use a three
+   argument invocation with a prerequisite as the first argument.
+
  - test_expect_failure [<prereq>] <message> <script>
 
    This is NOT the opposite of test_expect_success, but is used
-   to mark a test that demonstrates a known breakage.  Unlike
+   to mark a test that demonstrates a known breakage whose exact failure
+   behavior isn't predictable.
+
+   If the exact breakage is known the "test_expect_todo" function
+   should be used instead. Usethis function if it's hard to pin down
+   the exact nature of the failure. Unlike
    the usual test_expect_success tests, which say "ok" on
    success and "FAIL" on failure, this will say "FIXED" on
    success and "still broken" on failure.  Failures from these
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index b007f0efef2..e46638f1f7b 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -176,6 +176,8 @@  test_expect_success 'subtest: mixed results: a mixture of all possible results'
 	test_expect_failure "pretend we have a known breakage" "false"
 	test_expect_failure "pretend we have a known breakage" "false"
 	test_expect_failure "pretend we have fixed a known breakage" "true"
+	test_expect_todo "pretend we have a known TODO" "true"
+	test_expect_todo "pretend we have a bad TODO" "false"
 	test_done
 	EOF
 	check_sub_test_lib_test mixed-results2 <<-\EOF
@@ -192,10 +194,13 @@  test_expect_success 'subtest: mixed results: a mixture of all possible results'
 	> not ok 8 - pretend we have a known breakage # TODO known breakage
 	> not ok 9 - pretend we have a known breakage # TODO known breakage
 	> ok 10 - pretend we have fixed a known breakage # TODO known breakage vanished
+	> not ok 11 - pretend we have a known TODO # TODO known breakage
+	> not ok 12 - pretend we have a bad TODO (broken '\''test_expect_todo'\''!)
+	> #	false
 	> # 1 known breakage(s) vanished; please update test(s)
-	> # still have 2 known breakage(s)
-	> # failed 3 among remaining 7 test(s)
-	> 1..10
+	> # still have 3 known breakage(s)
+	> # failed 4 among remaining 8 test(s)
+	> 1..12
 	EOF
 '
 
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index bc89371f534..b7023f5644c 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -112,8 +112,8 @@  test_expect_success 'clone --local detects missing objects' '
 	test_must_fail git clone --local missing missing-checkout
 '
 
-test_expect_failure 'clone --local detects misnamed objects' '
-	test_must_fail git clone --local misnamed misnamed-checkout
+test_expect_todo 'clone --local detects misnamed objects' '
+	git clone --local misnamed misnamed-checkout
 '
 
 test_expect_success 'fetch into corrupted repo with index-pack' '
diff --git a/t/t7815-grep-binary.sh b/t/t7815-grep-binary.sh
index ac871287c03..2d17e05036e 100755
--- a/t/t7815-grep-binary.sh
+++ b/t/t7815-grep-binary.sh
@@ -64,8 +64,8 @@  test_expect_success 'git grep ile a' '
 	git grep ile a
 '
 
-test_expect_failure 'git grep .fi a' '
-	git grep .fi a
+test_expect_todo 'git grep .fi a' '
+	test_must_fail git grep .fi a
 '
 
 test_expect_success 'grep respects binary diff attribute' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 0f439c99d61..a219b126d93 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -718,11 +718,13 @@  test_verify_prereq () {
 	BUG "'$test_prereq' does not look like a prereq"
 }
 
-test_expect_failure () {
+_test_expect_todo () {
+	local func=$1
+	shift
 	test_start_
 	test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
 	test "$#" = 2 ||
-	BUG "not 2 or 3 parameters to test-expect-failure"
+	BUG "not 2 or 3 parameters to $func"
 	test_verify_prereq
 	export test_prereq
 	if ! test_skip "$@"
@@ -730,14 +732,32 @@  test_expect_failure () {
 		say >&3 "checking known breakage of $TEST_NUMBER.$test_count '$1': $2"
 		if test_run_ "$2" expecting_failure
 		then
-			test_known_broken_ok_ "$1"
+			case "$func" in
+			test_expect_todo) test_known_broken_ok_ "$func" "$1" ;;
+			test_expect_failure) test_known_broken_ok_ "$func" "$1" ;;
+			esac
 		else
-			test_known_broken_failure_ "$1"
+			case "$func" in
+			test_expect_todo)
+				test_title_=$1
+				shift
+				test_failure_ "$test_title_ (broken 'test_expect_todo'!)" "$@"
+				;;
+			test_expect_failure) test_known_broken_failure_ "$func" "$1" ;;
+			esac
 		fi
 	fi
 	test_finish_
 }
 
+test_expect_failure () {
+	_test_expect_todo test_expect_failure "$@"
+}
+
+test_expect_todo () {
+	_test_expect_todo test_expect_todo "$@"
+}
+
 test_expect_success () {
 	test_start_
 	test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 9af5fb7674d..fb23a6f6682 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -792,21 +792,45 @@  test_failure_ () {
 }
 
 test_known_broken_ok_ () {
+	local func=$1
+	shift
+
 	if test -n "$write_junit_xml"
 	then
 		write_junit_xml_testcase "$* (breakage fixed)"
 	fi
-	test_fixed=$(($test_fixed+1))
-	say_color error "ok $test_count - $@ # TODO known breakage vanished"
+
+	if test "$func" = "test_expect_todo"
+	then
+		# test_expect_todo
+		test_broken=$(($test_broken+1))
+		say_color warn "not ok $test_count - $@ # TODO known breakage"
+	else
+		# test_expect_failure
+		test_fixed=$(($test_fixed+1))
+		say_color error "ok $test_count - $@ # TODO known breakage vanished"
+	fi
 }
 
 test_known_broken_failure_ () {
+	local func=$1
+	shift
+
 	if test -n "$write_junit_xml"
 	then
 		write_junit_xml_testcase "$* (known breakage)"
 	fi
-	test_broken=$(($test_broken+1))
-	say_color warn "not ok $test_count - $@ # TODO known breakage"
+
+	if test "$func" = "test_expect_todo"
+	then
+		# test_expect_todo
+		test_fixed=$(($test_fixed+1))
+		say_color error "not ok $test_count - $@ # TODO a 'known breakage' changed behavior!"
+	else
+		# test_expect_failure
+		test_broken=$(($test_broken+1))
+		say_color warn "not ok $test_count - $@ # TODO known breakage"
+	fi
 }
 
 test_debug () {