diff mbox series

[2/7] t: introduce tests for unexpected object types

Message ID ef6b4f380435d9743a56f47f68c18123bf0a0933.1554435033.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series harden unexpected object types checks | expand

Commit Message

Taylor Blau April 5, 2019, 3:37 a.m. UTC
Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

Let A be the object referenced with an unexpected type, and B be the
object doing the referencing. Do the following:

  - test 'git rev-list --objects A B'. This causes A to be "cached", and
    presents the above scenario.

Likewise, if we have a tree entry that claims to be a tree (for example)
but points to another object type (say, a blob), there are two ways we
might find out:

  - when we call lookup_tree(), we might find that we've already seen
    the object referenced as another type, in which case we'd get NULL

  - we call lookup_tree() successfully, but when we try to read the
    object, we find out it's something else.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6102-rev-list-unexpected-objects.sh | 123 +++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

Comments

SZEDER Gábor April 5, 2019, 10:50 a.m. UTC | #1
On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:
> diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
> new file mode 100755
> index 0000000000..472b08528a
> --- /dev/null
> +++ b/t/t6102-rev-list-unexpected-objects.sh
> @@ -0,0 +1,123 @@
> +#!/bin/sh
> +
> +test_description='git rev-list should handle unexpected object types'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup well-formed objects' '
> +	blob="$(printf "foo" | git hash-object -w --stdin)" &&
> +	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
> +	commit="$(git commit-tree $tree -m "first commit")"
> +'
> +
> +test_expect_success 'setup unexpected non-blob entry' '
> +	printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
> +	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob entry (lone)' '
> +	test_must_fail git rev-list --objects $broken_tree
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob entry (seen)' '
> +	test_must_fail git rev-list --objects $tree $broken_tree
> +'
> +
> +test_expect_success 'setup unexpected non-tree entry' '
> +	printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
> +	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree entry (lone)' '
> +	test_must_fail git rev-list --objects $broken_tree
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1

This test saves standard output and error, but doesn't look at them.

> +'
> +
> +test_expect_success 'setup unexpected non-commit parent' '
> +	git cat-file commit $commit |
> +		perl -lpe "/^author/ && print q(parent $blob)" \
> +		>broken-commit &&

Don't run git commands upstream of a pipe, because the pipe hides
their exit code.  This applies to several other tests below as well.

Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
dependency?

> +	broken_commit="$(git hash-object -w --literally -t commit \
> +		broken-commit)"
> +'
> +
> +test_expect_success 'traverse unexpected non-commit parent (lone)' '
> +	test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
> +	test_i18ngrep "not a commit" output

Please make sure that this "not a commit" message goes to the file
descriptor it is supposed to, i.e., assuming it's part of an error
message:

  test_must_fail git rev-list .... 2>err &&
  test_i18ngrep "..." err

This applies to several other tests below and in other patches as
well.

> +'
> +
> +test_expect_success 'traverse unexpected non-commit parent (seen)' '
> +	test_must_fail git rev-list --objects $commit $broken_commit \
> +		>output 2>&1 &&
> +	test_i18ngrep "not a commit" output
> +'
> +
> +test_expect_success 'setup unexpected non-tree root' '
> +	git cat-file commit $commit |
> +	sed -e "s/$tree/$blob/" >broken-commit &&
> +	broken_commit="$(git hash-object -w --literally -t commit \
> +		broken-commit)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree root (lone)' '
> +	test_must_fail git rev-list --objects $broken_commit
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree root (seen)' '
> +	test_must_fail git rev-list --objects $blob $broken_commit
> +'
> +
> +test_expect_success 'setup unexpected non-commit tag' '
> +	git tag -a -m "tagged commit" tag $commit &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_success 'traverse unexpected non-commit tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-commit tag (seen)' '
> +	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
> +	test_i18ngrep "not a commit" output
> +'
> +
> +test_expect_success 'setup unexpected non-tree tag' '
> +	git tag -a -m "tagged tree" tag $tree &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag |
> +	sed -e "s/$tree/$blob/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_success 'traverse unexpected non-tree tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-tree tag (seen)' '
> +	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
> +	test_i18ngrep "not a tree" output
> +'
> +
> +test_expect_success 'setup unexpected non-blob tag' '
> +	git tag -a -m "tagged blob" tag $blob &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag |
> +	sed -e "s/$blob/$commit/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-blob tag (seen)' '
> +	test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
> +	test_i18ngrep "not a blob" output
> +'
> +
> +test_done
> -- 
> 2.21.0.203.g358da99528
>
Jeff King April 5, 2019, 6:24 p.m. UTC | #2
On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:

> > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> 
> This test saves standard output and error, but doesn't look at them.

I think we want to be checking for "not a tree" here, which is later
added with the fix. But either we should have the test_i18ngrep here
initially, or we should add both the redirect and the grep with the fix.

> > +test_expect_success 'setup unexpected non-commit parent' '
> > +	git cat-file commit $commit |
> > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > +		>broken-commit &&
> 
> Don't run git commands upstream of a pipe, because the pipe hides
> their exit code.  This applies to several other tests below as well.

I disagree with that rule here. We're not testing "cat-file" in any
meaningful way, but just getting some stock output from a known-good
commit.

> Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> dependency?

Heh, this was actually the subject of much discussion before the patches
hit the list. If you can write such a one-liner that is both readable
and portable, please share it. I got disgusted with sed and suggested
this perl.

-Peff
Jeff King April 5, 2019, 6:31 p.m. UTC | #3
On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:

> Let A be the object referenced with an unexpected type, and B be the
> object doing the referencing. Do the following:
> 
>   - test 'git rev-list --objects A B'. This causes A to be "cached", and
>     presents the above scenario.
> 
> Likewise, if we have a tree entry that claims to be a tree (for example)
> but points to another object type (say, a blob), there are two ways we
> might find out:
> 
>   - when we call lookup_tree(), we might find that we've already seen
>     the object referenced as another type, in which case we'd get NULL
> 
>   - we call lookup_tree() successfully, but when we try to read the
>     object, we find out it's something else.
> 
> We should check that we behave sensibly in both cases (especially
> because it is easy for a malicious actor to provoke one case or the
> other).

I think our pasting together of multiple commits adding the lone/seen
cases ended up in some redundancy in the description. In particular, I'm
not sure what the first paragraph/bullet quoted above is trying to say,
as it corresponds to the second bullet in the later list. Maybe collapse
them together like:

  We might hit an unexpected type in two different ways (imagine we have
  a tree entry that claims to be a tree but actually points to a blob):

    - when we call lookup_tree(), we might find that we've already seen
      the object referenced as a blob, in which case we'd get NULL. We
      can exercise this with "git rev-list --objects $blob $tree", which
      guarantees that the blob will have been parsed before we look in
      the tree. These tests are marked as "seen" in the test script.

    - we call lookup_tree() successfully, but when we try to read the
      object, we find out it's something else. We construct our tests
      such that $blob is not otherwise mentioned in $tree. These tests
      are marked as "lone" in the script.

-Peff
SZEDER Gábor April 5, 2019, 6:42 p.m. UTC | #4
On Fri, Apr 05, 2019 at 02:24:12PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> 
> > > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> > 
> > This test saves standard output and error, but doesn't look at them.
> 
> I think we want to be checking for "not a tree" here, which is later
> added with the fix. But either we should have the test_i18ngrep here
> initially, or we should add both the redirect and the grep with the fix.
> 
> > > +test_expect_success 'setup unexpected non-commit parent' '
> > > +	git cat-file commit $commit |
> > > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > > +		>broken-commit &&
> > 
> > Don't run git commands upstream of a pipe, because the pipe hides
> > their exit code.  This applies to several other tests below as well.
> 
> I disagree with that rule here. We're not testing "cat-file" in any
> meaningful way, but just getting some stock output from a known-good
> commit.

It makes auditing harder and sets bad example.
 
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
> 
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

Hm, so the trivial s/// with '\n' in the replacement part is not
portable, then?  Oh, well.
Jeff King April 5, 2019, 6:52 p.m. UTC | #5
On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:

> > > Don't run git commands upstream of a pipe, because the pipe hides
> > > their exit code.  This applies to several other tests below as well.
> > 
> > I disagree with that rule here. We're not testing "cat-file" in any
> > meaningful way, but just getting some stock output from a known-good
> > commit.
> 
> It makes auditing harder and sets bad example.

I disagree that it's a bad example. Just because a reader might not
realize that it is sometimes OK and sometimes not, does not make it bad
to do so in the OK case. It means the reader needs to understand the
rule in order to correctly apply it.

I am sympathetic to the auditing issue, though.

I dunno. In this case it is not too bad to do:

  git cat-file commit $commit >good-commit &&
  perl ... <good-commit >broken-commit

but I am not sure I am on board with a blanket rule of "git must never
be on the left-hand side of a pipe".

> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > dependency?
> > 
> > Heh, this was actually the subject of much discussion before the patches
> > hit the list. If you can write such a one-liner that is both readable
> > and portable, please share it. I got disgusted with sed and suggested
> > this perl.
> 
> Hm, so the trivial s/// with '\n' in the replacement part is not
> portable, then?  Oh, well.

I don't think it is, but I could be wrong. POSIX does say that "\n"
matches a newline in the pattern space, but nothing about it on the RHS
of a substitution. I have a vague feeling of running into problems in
the past, but I could just be misremembering.

We also tried matching /^tree/ and using "a\" to append a line, but it
was both hard to read and hit portability issues with bsd sed.

-Peff
Eric Sunshine April 5, 2019, 7:25 p.m. UTC | #6
On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > +   git cat-file commit $commit |
> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > +           >broken-commit &&
>
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
>
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

Trivial and portable 'sed' equivalent:

git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
Jeff King April 5, 2019, 8:53 p.m. UTC | #7
On Fri, Apr 05, 2019 at 03:25:43PM -0400, Eric Sunshine wrote:

> On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> > On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > > +   git cat-file commit $commit |
> > > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > > +           >broken-commit &&
> >
> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > dependency?
> >
> > Heh, this was actually the subject of much discussion before the patches
> > hit the list. If you can write such a one-liner that is both readable
> > and portable, please share it. I got disgusted with sed and suggested
> > this perl.
> 
> Trivial and portable 'sed' equivalent:
> 
> git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"

I always forget about the hold space. That's pretty readable (though
being sed, it's terse enough that I actually think the perl is more
readable; that may be personal taste, though).

-Peff
Taylor Blau April 6, 2019, 5:23 a.m. UTC | #8
Hi Peff,

On Fri, Apr 05, 2019 at 02:31:42PM -0400, Jeff King wrote:
> On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:
>
> > Let A be the object referenced with an unexpected type, and B be the
> > object doing the referencing. Do the following:
> >
> >   - test 'git rev-list --objects A B'. This causes A to be "cached", and
> >     presents the above scenario.
> >
> > Likewise, if we have a tree entry that claims to be a tree (for example)
> > but points to another object type (say, a blob), there are two ways we
> > might find out:
> >
> >   - when we call lookup_tree(), we might find that we've already seen
> >     the object referenced as another type, in which case we'd get NULL
> >
> >   - we call lookup_tree() successfully, but when we try to read the
> >     object, we find out it's something else.
> >
> > We should check that we behave sensibly in both cases (especially
> > because it is easy for a malicious actor to provoke one case or the
> > other).
>
> I think our pasting together of multiple commits adding the lone/seen
> cases ended up in some redundancy in the description. In particular, I'm
> not sure what the first paragraph/bullet quoted above is trying to say,
> as it corresponds to the second bullet in the later list.

I agree that this is at the very least redundant, and at the most,
confusing. I think that you're right that this is the result of
squashing the commits often enough that some of this cruft stuck around
and got combined in ways that it shouldn't have.

> Maybe collapse them together like:
>
>   We might hit an unexpected type in two different ways (imagine we have
>   a tree entry that claims to be a tree but actually points to a blob):
>
>     - when we call lookup_tree(), we might find that we've already seen
>       the object referenced as a blob, in which case we'd get NULL. We
>       can exercise this with "git rev-list --objects $blob $tree", which
>       guarantees that the blob will have been parsed before we look in
>       the tree. These tests are marked as "seen" in the test script.
>
>     - we call lookup_tree() successfully, but when we try to read the
>       object, we find out it's something else. We construct our tests
>       such that $blob is not otherwise mentioned in $tree. These tests
>       are marked as "lone" in the script.

Indeed, this is much cleaner (and thanks for writing it here, and saving
me the work). I think that I will take this as-is for 2/7 in v2.

> -Peff

Thanks,
Taylor
Taylor Blau April 6, 2019, 5:31 a.m. UTC | #9
On Fri, Apr 05, 2019 at 02:24:12PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
>
> > > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> >
> > This test saves standard output and error, but doesn't look at them.
>
> I think we want to be checking for "not a tree" here, which is later
> added with the fix. But either we should have the test_i18ngrep here
> initially, or we should add both the redirect and the grep with the fix.

Right, pointing out that saving the standard output and error of 'git
rev-list' and then doing nothing with it as being redundant is certainly
right.

I think that the 'fix' here is to write instead:

  +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
  +	test_must_fail git rev-list --objects $blob $broken_tree
  +'

And _then_ add '>output 2>&1 &&' to the end, capturing the output, as
well as the appropriate test_i18ngrep. This matches the pattern that
we've been otherwise following in the series so far.

(FWIW, I think that this is also the result of squashing the series down
a few times...)

> > > +test_expect_success 'setup unexpected non-commit parent' '
> > > +	git cat-file commit $commit |
> > > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > > +		>broken-commit &&
> >
> > Don't run git commands upstream of a pipe, because the pipe hides
> > their exit code.  This applies to several other tests below as well.
>
> I disagree with that rule here. We're not testing "cat-file" in any
> meaningful way, but just getting some stock output from a known-good
> commit.
>
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
>
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

I admit that this gave me a chuckle, too. When preparing this series to
send it, I did something like:

  $ git rebase -x 'make && cd t && ./t6102-*.sh

but found that it was broken on the BSD sed that ships with macOS
10.14.2. I didn't notice until preparing the series for upstream because
I wrote it on my Debian 9 VM, with GNU sed (where it is not broken ;-)).

It was originally written as:

	test_expect_success 'setup unexpected non-commit parent' '
		git cat-file commit $commit | sed "/^tree/a\
		parent $blob" >broken-commit &&
		broken_commit="$(git hash-object -w --literally -t commit \
			broken-commit)"
	'

> -Peff

Thanks,
Taylor
Taylor Blau April 6, 2019, 5:33 a.m. UTC | #10
On Fri, Apr 05, 2019 at 04:53:45PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 03:25:43PM -0400, Eric Sunshine wrote:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> > > On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > > > +   git cat-file commit $commit |
> > > > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > > > +           >broken-commit &&
> > >
> > > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > > dependency?
> > >
> > > Heh, this was actually the subject of much discussion before the patches
> > > hit the list. If you can write such a one-liner that is both readable
> > > and portable, please share it. I got disgusted with sed and suggested
> > > this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> I always forget about the hold space. That's pretty readable (though
> being sed, it's terse enough that I actually think the perl is more
> readable; that may be personal taste, though).

Ah, very nice. Thanks Eric, your sed-fu is much stronger than mine ;-).

I share Peff's view that this might be less readable than its Perl
counterpart, but am similarly sympathetic to the notion that more Perl
is a bad example in 't'.

I think that I'll take this for v2 and get rid of the Perl.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason April 7, 2019, 9 p.m. UTC | #11
On Fri, Apr 05 2019, Jeff King wrote:

> On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
>
>> > > Don't run git commands upstream of a pipe, because the pipe hides
>> > > their exit code.  This applies to several other tests below as well.
>> >
>> > I disagree with that rule here. We're not testing "cat-file" in any
>> > meaningful way, but just getting some stock output from a known-good
>> > commit.
>>
>> It makes auditing harder and sets bad example.
>
> I disagree that it's a bad example. Just because a reader might not
> realize that it is sometimes OK and sometimes not, does not make it bad
> to do so in the OK case. It means the reader needs to understand the
> rule in order to correctly apply it.

FWIW I thought the rule was something to the effect of "we're hacking on
git, any change might segfault in some unexpected test, let's make sure
that fails right away", hence the blanket rule in t/README against "!
git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
output of a git command to a pipe" documented right after that.

> I am sympathetic to the auditing issue, though.
>
> I dunno. In this case it is not too bad to do:
>
>   git cat-file commit $commit >good-commit &&
>   perl ... <good-commit >broken-commit
>
> but I am not sure I am on board with a blanket rule of "git must never
> be on the left-hand side of a pipe".
>
>> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> > > dependency?
>> >
>> > Heh, this was actually the subject of much discussion before the patches
>> > hit the list. If you can write such a one-liner that is both readable
>> > and portable, please share it. I got disgusted with sed and suggested
>> > this perl.
>>
>> Hm, so the trivial s/// with '\n' in the replacement part is not
>> portable, then?  Oh, well.
>
> I don't think it is, but I could be wrong. POSIX does say that "\n"
> matches a newline in the pattern space, but nothing about it on the RHS
> of a substitution. I have a vague feeling of running into problems in
> the past, but I could just be misremembering.
>
> We also tried matching /^tree/ and using "a\" to append a line, but it
> was both hard to read and hit portability issues with bsd sed.
>
> -Peff
Junio C Hamano April 8, 2019, 5:27 a.m. UTC | #12
Jeff King <peff@peff.net> writes:

> I don't think it is, but I could be wrong. POSIX does say that "\n"
> matches a newline in the pattern space, but nothing about it on the RHS
> of a substitution. I have a vague feeling of running into problems in
> the past, but I could just be misremembering.

Yes, it was quite bad on minority platforms like AIX when I had to
touch it the last time.
Junio C Hamano April 8, 2019, 6:44 a.m. UTC | #13
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
>> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
>> > > +   git cat-file commit $commit |
>> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
>> > > +           >broken-commit &&
>>
>> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> > dependency?
>>
>> Heh, this was actually the subject of much discussion before the patches
>> hit the list. If you can write such a one-liner that is both readable
>> and portable, please share it. I got disgusted with sed and suggested
>> this perl.
>
> Trivial and portable 'sed' equivalent:
>
> git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"

Looks good.  I had a bit of head scratching moment when I saw that
"perl -lpe" one-liner; this sed expression may not be crystal clear
to those who are not used to, but it is not so bad, either.
Taylor Blau April 9, 2019, 2:29 a.m. UTC | #14
Hi Ævar,

On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Fri, Apr 05 2019, Jeff King wrote:
>
> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >
> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> > > their exit code.  This applies to several other tests below as well.
> >> >
> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> > meaningful way, but just getting some stock output from a known-good
> >> > commit.
> >>
> >> It makes auditing harder and sets bad example.
> >
> > I disagree that it's a bad example. Just because a reader might not
> > realize that it is sometimes OK and sometimes not, does not make it bad
> > to do so in the OK case. It means the reader needs to understand the
> > rule in order to correctly apply it.
>
> FWIW I thought the rule was something to the effect of "we're hacking on
> git, any change might segfault in some unexpected test, let's make sure
> that fails right away", hence the blanket rule in t/README against "!
> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
> output of a git command to a pipe" documented right after that.

I have some loosely-held opinions on this. Certainly knowing if a change
to git caused a segfault in some test is something that we would want to
know about, though I'm not sure we're loosing anything by putting 'git'
on the left-hand side of a pipe here.

  - If someone wrote a change to git that introduced a segfault in 'git
    cat-file', I'm sure that this would not be the only place that in
    the suite that would break because of it. Presumably, at least one
    of those places uses 'test_must_fail git ...' instead

  - At the very least, 'broken-commit' doesn't contain what it needs to,
    the test breaks in another way (albeit further from the actual
    defect), and the developer finds out about their bug that way.

In any case, these above two might be moot anyways, because I'm almost
certain that it will be a rarity for someone to _only_ run t6102, unless
it is included in a blanket 'make' from within 't'.

> > I am sympathetic to the auditing issue, though.

Just to satisfy my curiosity, I put git on the left-hand side of a pipe
to see how many such examples exist today:

  ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
      179

I'm not going to claim that changing 179 -> 180 is neutral or bad, but
it's interesting nonetheless.

> > I dunno. In this case it is not too bad to do:
> >
> >   git cat-file commit $commit >good-commit &&
> >   perl ... <good-commit >broken-commit
> >
> > but I am not sure I am on board with a blanket rule of "git must never
> > be on the left-hand side of a pipe".

I think that I mostly agree with Peff here for the reasons above.

That all said, I don't really want to hold up the series for this alone.
Since there don't seem to be many other comments or issues, I would be
quite happy to do whatever people are most in favor of.

I basically don't really feel strongly about writing:

  git cat-file commit $commit | sed -e ... >broken-commit

as opposed to:

  git cat-file commit $commit >good-commit &&
  sed -e '' <commit >bad-commit

> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > > dependency?
> >> >
> >> > Heh, this was actually the subject of much discussion before the patches
> >> > hit the list. If you can write such a one-liner that is both readable
> >> > and portable, please share it. I got disgusted with sed and suggested
> >> > this perl.
> >>
> >> Hm, so the trivial s/// with '\n' in the replacement part is not
> >> portable, then?  Oh, well.
> >
> > I don't think it is, but I could be wrong. POSIX does say that "\n"
> > matches a newline in the pattern space, but nothing about it on the RHS
> > of a substitution. I have a vague feeling of running into problems in
> > the past, but I could just be misremembering.
> >
> > We also tried matching /^tree/ and using "a\" to append a line, but it
> > was both hard to read and hit portability issues with bsd sed.

I think that all of this discussion is addressed elsewhere in thread, but
the gist is that Eric provided a suitable sed invocation that I am going
to use instead of Peff's Perl.

> > -Peff

Thanks,
Taylor
Taylor Blau April 9, 2019, 2:30 a.m. UTC | #15
Hi Junio,


On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> >> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> >> > > +   git cat-file commit $commit |
> >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> >> > > +           >broken-commit &&
> >>
> >> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > dependency?
> >>
> >> Heh, this was actually the subject of much discussion before the patches
> >> hit the list. If you can write such a one-liner that is both readable
> >> and portable, please share it. I got disgusted with sed and suggested
> >> this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> Looks good.  I had a bit of head scratching moment when I saw that
> "perl -lpe" one-liner; this sed expression may not be crystal clear
> to those who are not used to, but it is not so bad, either.

Should I take this as your endorsement of putting 'git' on the left-hand
side of a pipe? ;-).

Thanks,
Taylor
Eric Sunshine April 9, 2019, 3:28 a.m. UTC | #16
On Mon, Apr 8, 2019 at 10:31 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> > >> > > +   git cat-file commit $commit |
> > >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > >> > > +           >broken-commit &&
> > >
> > > Trivial and portable 'sed' equivalent:
> > >
> > > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
> >
> > Looks good.  I had a bit of head scratching moment when I saw that
> > "perl -lpe" one-liner; this sed expression may not be crystal clear
> > to those who are not used to, but it is not so bad, either.
>
> Should I take this as your endorsement of putting 'git' on the left-hand
> side of a pipe? ;-).

I suspect that Junio's "Looks good" was referring to the 'sed expression.

With all the recent work of moving away from having Git upstream of a
pipe, let's not intentionally introduce a new instance. I wrote the
example 'sed' expression that way merely to mirror how the original
'perl' version was written to make it easier to see the equivalence
(not because it was intended as an endorsement of having Git upstream
of a pipe).
Taylor Blau April 9, 2019, 5:08 a.m. UTC | #17
Hi Eric,

On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> On Mon, Apr 8, 2019 at 10:31 PM Taylor Blau <me@ttaylorr.com> wrote:
> > On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> > > Eric Sunshine <sunshine@sunshineco.com> writes:
> > > >> > > +   git cat-file commit $commit |
> > > >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > >> > > +           >broken-commit &&
> > > >
> > > > Trivial and portable 'sed' equivalent:
> > > >
> > > > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
> > >
> > > Looks good.  I had a bit of head scratching moment when I saw that
> > > "perl -lpe" one-liner; this sed expression may not be crystal clear
> > > to those who are not used to, but it is not so bad, either.
> >
> > Should I take this as your endorsement of putting 'git' on the left-hand
> > side of a pipe? ;-).
>
> I suspect that Junio's "Looks good" was referring to the 'sed expression.

I think that you are right -- and I'll happily _not_ introduce more Git
on the left-hand-side of a pipe instances.

I noticed a few more instances in t6102 where we do something similar
to:

  git cat-file -p <something> | sed ... >broken-<something> &&

I wrote the following patch, which I've folded into my local copy (and
will send with v2):

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index b9d82f9542..15072ecce3 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -7,7 +7,8 @@ test_description='git rev-list should handle unexpected object types'
 test_expect_success 'setup well-formed objects' '
 	blob="$(printf "foo" | git hash-object -w --stdin)" &&
 	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
-	commit="$(git commit-tree $tree -m "first commit")"
+	commit="$(git commit-tree $tree -m "first commit")" &&
+	git cat-file commit $commit >good-commit
 '

 test_expect_success 'setup unexpected non-blob entry' '
@@ -37,8 +38,8 @@ test_expect_failure 'traverse unexpected non-tree entry (seen)' '
 '

 test_expect_success 'setup unexpected non-commit parent' '
-	git cat-file commit $commit | \
-		sed "/^author/ { h; s/.*/parent $blob/; G; }" >broken-commit &&
+	sed "/^author/ { h; s/.*/parent $blob/; G; }" <good-commit \
+		>broken-commit &&
 	broken_commit="$(git hash-object -w --literally -t commit \
 		broken-commit)"
 '
@@ -55,8 +56,7 @@ test_expect_success 'traverse unexpected non-commit parent (seen)' '
 '

 test_expect_success 'setup unexpected non-tree root' '
-	git cat-file commit $commit |
-	sed -e "s/$tree/$blob/" >broken-commit &&
+	sed -e "s/$tree/$blob/" <good-commit >broken-commit &&
 	broken_commit="$(git hash-object -w --literally -t commit \
 		broken-commit)"
 '
@@ -71,8 +71,9 @@ test_expect_failure 'traverse unexpected non-tree root (seen)' '

 test_expect_success 'setup unexpected non-commit tag' '
 	git tag -a -m "tagged commit" tag $commit &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
+	sed -e "s/$commit/$blob/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -87,9 +88,9 @@ test_expect_success 'traverse unexpected non-commit tag (seen)' '

 test_expect_success 'setup unexpected non-tree tag' '
 	git tag -a -m "tagged tree" tag $tree &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag |
-	sed -e "s/$tree/$blob/" >broken-tag &&
+	sed -e "s/$tree/$blob/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -104,9 +105,9 @@ test_expect_success 'traverse unexpected non-tree tag (seen)' '

 test_expect_success 'setup unexpected non-blob tag' '
 	git tag -a -m "tagged blob" tag $blob &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag |
-	sed -e "s/$blob/$commit/" >broken-tag &&
+	sed -e "s/$blob/$commit/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

> With all the recent work of moving away from having Git upstream of a
> pipe, let's not intentionally introduce a new instance. I wrote the
> example 'sed' expression that way merely to mirror how the original
> 'perl' version was written to make it easier to see the equivalence
> (not because it was intended as an endorsement of having Git upstream
> of a pipe).

I see, and thank you for the clarification. Let me know if you like the
patch above.

Thanks,
Taylor
Eric Sunshine April 9, 2019, 8:02 a.m. UTC | #18
On Tue, Apr 9, 2019 at 1:09 AM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> > I suspect that Junio's "Looks good" was referring to the 'sed expression.
>
> I think that you are right -- and I'll happily _not_ introduce more Git
> on the left-hand-side of a pipe instances.
>
> I noticed a few more instances in t6102 [...]

Indeed, SZEDER mentioned those in [1]:

    Don't run git commands upstream of a pipe, because the pipe
    hides their exit code.  This applies to several other tests
    below as well.

[1]: http://public-inbox.org/git/20190405105033.GT32732@szeder.dev/

> I wrote the following patch, which I've folded into my local copy (and
> will send with v2):
>
> > With all the recent work of moving away from having Git upstream of a
> > pipe, let's not intentionally introduce a new instance. I wrote the
> > example 'sed' expression that way merely to mirror how the original
> > 'perl' version was written to make it easier to see the equivalence
> > (not because it was intended as an endorsement of having Git upstream
> > of a pipe).
>
> I see, and thank you for the clarification. Let me know if you like the
> patch above.

Looks fine. Thanks.
Ævar Arnfjörð Bjarmason April 9, 2019, 9:14 a.m. UTC | #19
On Tue, Apr 09 2019, Taylor Blau wrote:

> Hi Ævar,
>
> On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Apr 05 2019, Jeff King wrote:
>>
>> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
>> >
>> >> > > Don't run git commands upstream of a pipe, because the pipe hides
>> >> > > their exit code.  This applies to several other tests below as well.
>> >> >
>> >> > I disagree with that rule here. We're not testing "cat-file" in any
>> >> > meaningful way, but just getting some stock output from a known-good
>> >> > commit.
>> >>
>> >> It makes auditing harder and sets bad example.
>> >
>> > I disagree that it's a bad example. Just because a reader might not
>> > realize that it is sometimes OK and sometimes not, does not make it bad
>> > to do so in the OK case. It means the reader needs to understand the
>> > rule in order to correctly apply it.
>>
>> FWIW I thought the rule was something to the effect of "we're hacking on
>> git, any change might segfault in some unexpected test, let's make sure
>> that fails right away", hence the blanket rule in t/README against "!
>> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
>> output of a git command to a pipe" documented right after that.
>
> I have some loosely-held opinions on this. Certainly knowing if a change
> to git caused a segfault in some test is something that we would want to
> know about, though I'm not sure we're loosing anything by putting 'git'
> on the left-hand side of a pipe here.
>
>   - If someone wrote a change to git that introduced a segfault in 'git
>     cat-file', I'm sure that this would not be the only place that in
>     the suite that would break because of it. Presumably, at least one
>     of those places uses 'test_must_fail git ...' instead
>
>   - At the very least, 'broken-commit' doesn't contain what it needs to,
>     the test breaks in another way (albeit further from the actual
>     defect), and the developer finds out about their bug that way.
>
> In any case, these above two might be moot anyways, because I'm almost
> certain that it will be a rarity for someone to _only_ run t6102, unless
> it is included in a blanket 'make' from within 't'.

First. I realize we're talking about the light fixtures in the bike shed
at this point, but with that in mind...

I just think it's useful as a general rule, particularly since with the
"special setups" in the test mode we've found that all sorts of odd
tests we didn't expect to stress test some features turned out to cover
edge cases of them, e.g. when "GIT_TEST_COMMIT_GRAPH" was added we found
that a bunch of random stuff segfaulted all over the place.

So it's hard to say with any confidence in advance that something isn't
going to stress git in some unusual way and isn't useful to guard for
segfaults.

Of course if and when it segfaults it's likely to be seen by subsequent
tests. In this case I'll note that if git fails we'll happily run not
only perl/sed, but then hash-object the subsequent empty file, and then
(presumably) fail in the next test.

>> > I am sympathetic to the auditing issue, though.
>
> Just to satisfy my curiosity, I put git on the left-hand side of a pipe
> to see how many such examples exist today:
>
>   ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
>       179
>
> I'm not going to claim that changing 179 -> 180 is neutral or bad, but
> it's interesting nonetheless.

Separate from the question of if we generally agree that some value of
"Y" makes for good coding style or not, we're always going to be in some
flux state where a bunch of examples in our existing codebase contradict
that, particularly in the test suite.

I think that's a bit unfortunate in some ways. It's the result of the
default "policy" that refactoring for refactoring's sakes is generally
frowned upon, so we don't tend to go back and mass-fix "X" to "Y" once
we agree "Y" is better than "X" for new code, just do it as we go when
new code is written, or existing tests are updated for other reasons
"while we're at it".

>> > I dunno. In this case it is not too bad to do:
>> >
>> >   git cat-file commit $commit >good-commit &&
>> >   perl ... <good-commit >broken-commit
>> >
>> > but I am not sure I am on board with a blanket rule of "git must never
>> > be on the left-hand side of a pipe".
>
> I think that I mostly agree with Peff here for the reasons above.
>
> That all said, I don't really want to hold up the series for this alone.
> Since there don't seem to be many other comments or issues, I would be
> quite happy to do whatever people are most in favor of.

FWIW this series LGTM as a whole. I think it says something about the
general quality of it that we're all in some giant nitpick thread about
perl v.s. sed and git on the LHS of a pipe or not :) I'm happy to have
it queued as-is. These test issues are minor...

> I basically don't really feel strongly about writing:
>
>   git cat-file commit $commit | sed -e ... >broken-commit
>
> as opposed to:
>
>   git cat-file commit $commit >good-commit &&
>   sed -e '' <commit >bad-commit
>
>> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> >> > > dependency?
>> >> >
>> >> > Heh, this was actually the subject of much discussion before the patches
>> >> > hit the list. If you can write such a one-liner that is both readable
>> >> > and portable, please share it. I got disgusted with sed and suggested
>> >> > this perl.
>> >>
>> >> Hm, so the trivial s/// with '\n' in the replacement part is not
>> >> portable, then?  Oh, well.
>> >
>> > I don't think it is, but I could be wrong. POSIX does say that "\n"
>> > matches a newline in the pattern space, but nothing about it on the RHS
>> > of a substitution. I have a vague feeling of running into problems in
>> > the past, but I could just be misremembering.
>> >
>> > We also tried matching /^tree/ and using "a\" to append a line, but it
>> > was both hard to read and hit portability issues with bsd sed.
>
> I think that all of this discussion is addressed elsewhere in thread, but
> the gist is that Eric provided a suitable sed invocation that I am going
> to use instead of Peff's Perl.
>
>> > -Peff
>
> Thanks,
> Taylor
Taylor Blau April 10, 2019, 1:54 a.m. UTC | #20
Hi Eric,

On Tue, Apr 09, 2019 at 04:02:23AM -0400, Eric Sunshine wrote:
> On Tue, Apr 9, 2019 at 1:09 AM Taylor Blau <me@ttaylorr.com> wrote:
> > On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> > > I suspect that Junio's "Looks good" was referring to the 'sed expression.
> >
> > I think that you are right -- and I'll happily _not_ introduce more Git
> > on the left-hand-side of a pipe instances.
> >
> > I noticed a few more instances in t6102 [...]
>
> Indeed, SZEDER mentioned those in [1]:

Aha, thanks -- I'm not sure how I missed that in [1]. Credit is given
where it is due :-).

>     Don't run git commands upstream of a pipe, because the pipe
>     hides their exit code.  This applies to several other tests
>     below as well.
>
> [1]: http://public-inbox.org/git/20190405105033.GT32732@szeder.dev/
>
> > I wrote the following patch, which I've folded into my local copy (and
> > will send with v2):
> >
> > > With all the recent work of moving away from having Git upstream of a
> > > pipe, let's not intentionally introduce a new instance. I wrote the
> > > example 'sed' expression that way merely to mirror how the original
> > > 'perl' version was written to make it easier to see the equivalence
> > > (not because it was intended as an endorsement of having Git upstream
> > > of a pipe).
> >
> > I see, and thank you for the clarification. Let me know if you like the
> > patch above.
>
> Looks fine. Thanks.

Great -- I appreciate your review. I'll send v2 shortly with these
changes in it.

Thanks,
Taylor
Taylor Blau April 10, 2019, 1:59 a.m. UTC | #21
Hi Ævar,

On Tue, Apr 09, 2019 at 11:14:48AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Apr 09 2019, Taylor Blau wrote:
>
> > Hi Ævar,
> >
> > On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Apr 05 2019, Jeff King wrote:
> >>
> >> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >> >
> >> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> >> > > their exit code.  This applies to several other tests below as well.
> >> >> >
> >> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> >> > meaningful way, but just getting some stock output from a known-good
> >> >> > commit.
> >> >>
> >> >> It makes auditing harder and sets bad example.
> >> >
> >> > I disagree that it's a bad example. Just because a reader might not
> >> > realize that it is sometimes OK and sometimes not, does not make it bad
> >> > to do so in the OK case. It means the reader needs to understand the
> >> > rule in order to correctly apply it.
> >>
> >> FWIW I thought the rule was something to the effect of "we're hacking on
> >> git, any change might segfault in some unexpected test, let's make sure
> >> that fails right away", hence the blanket rule in t/README against "!
> >> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
> >> output of a git command to a pipe" documented right after that.
> >
> > I have some loosely-held opinions on this. Certainly knowing if a change
> > to git caused a segfault in some test is something that we would want to
> > know about, though I'm not sure we're loosing anything by putting 'git'
> > on the left-hand side of a pipe here.
> >
> >   - If someone wrote a change to git that introduced a segfault in 'git
> >     cat-file', I'm sure that this would not be the only place that in
> >     the suite that would break because of it. Presumably, at least one
> >     of those places uses 'test_must_fail git ...' instead
> >
> >   - At the very least, 'broken-commit' doesn't contain what it needs to,
> >     the test breaks in another way (albeit further from the actual
> >     defect), and the developer finds out about their bug that way.
> >
> > In any case, these above two might be moot anyways, because I'm almost
> > certain that it will be a rarity for someone to _only_ run t6102, unless
> > it is included in a blanket 'make' from within 't'.
>
> First. I realize we're talking about the light fixtures in the bike shed
> at this point, but with that in mind...

I don't mind some bike-shedding ;-).

> I just think it's useful as a general rule, particularly since with the
> "special setups" in the test mode we've found that all sorts of odd
> tests we didn't expect to stress test some features turned out to cover
> edge cases of them, e.g. when "GIT_TEST_COMMIT_GRAPH" was added we found
> that a bunch of random stuff segfaulted all over the place.
>
> So it's hard to say with any confidence in advance that something isn't
> going to stress git in some unusual way and isn't useful to guard for
> segfaults.

Yes, I think that's certainly true. There's something to be said for my
argument, too, (i.e., that it's ``probably'' just fine), but as I think
about it more, I'm less convinced that mine is a position worth holding.

Even if we catch only a bug or two on an off chance, there aren't too
many things I'd sacrifice to not make it so. In other words, I'm willing
to do quite a lot in order to get even a little indication of when
things are broken (including replacing 'git ... | sed' with 'git ... >foo &&
sed ... <foo').

> Of course if and when it segfaults it's likely to be seen by subsequent
> tests. In this case I'll note that if git fails we'll happily run not
> only perl/sed, but then hash-object the subsequent empty file, and then
> (presumably) fail in the next test.

Sure, we'll probably find out about it anyway, but still...

> >> > I am sympathetic to the auditing issue, though.
> >
> > Just to satisfy my curiosity, I put git on the left-hand side of a pipe
> > to see how many such examples exist today:
> >
> >   ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
> >       179
> >
> > I'm not going to claim that changing 179 -> 180 is neutral or bad, but
> > it's interesting nonetheless.
>
> Separate from the question of if we generally agree that some value of
> "Y" makes for good coding style or not, we're always going to be in some
> flux state where a bunch of examples in our existing codebase contradict
> that, particularly in the test suite.
>
> I think that's a bit unfortunate in some ways. It's the result of the
> default "policy" that refactoring for refactoring's sakes is generally
> frowned upon, so we don't tend to go back and mass-fix "X" to "Y" once
> we agree "Y" is better than "X" for new code, just do it as we go when
> new code is written, or existing tests are updated for other reasons
> "while we're at it".

Is that refactoring for its own sake, though? I would personally be
happy to write a series that either (1) identifies spots in 't' where
`git` is found on the left-hand side of a pipe, (2) fixes the bad-style
invocation in those spots, or (3) both.

> >> > I dunno. In this case it is not too bad to do:
> >> >
> >> >   git cat-file commit $commit >good-commit &&
> >> >   perl ... <good-commit >broken-commit
> >> >
> >> > but I am not sure I am on board with a blanket rule of "git must never
> >> > be on the left-hand side of a pipe".
> >
> > I think that I mostly agree with Peff here for the reasons above.
> >
> > That all said, I don't really want to hold up the series for this alone.
> > Since there don't seem to be many other comments or issues, I would be
> > quite happy to do whatever people are most in favor of.
>
> FWIW this series LGTM as a whole. I think it says something about the
> general quality of it that we're all in some giant nitpick thread about
> perl v.s. sed and git on the LHS of a pipe or not :) I'm happy to have
> it queued as-is. These test issues are minor...

Thanks, I appreciate that you think the series is in good shape. Since
there haven't been too many comments other than this thread, I'm going
to send v2 now.

I decided to ban t6102 of all such 'git ... | sed' invocations, and
described the change earlier in this thread.

> > I basically don't really feel strongly about writing:
> >
> >   git cat-file commit $commit | sed -e ... >broken-commit
> >
> > as opposed to:
> >
> >   git cat-file commit $commit >good-commit &&
> >   sed -e '' <commit >bad-commit
> >
> >> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> >> > > dependency?
> >> >> >
> >> >> > Heh, this was actually the subject of much discussion before the patches
> >> >> > hit the list. If you can write such a one-liner that is both readable
> >> >> > and portable, please share it. I got disgusted with sed and suggested
> >> >> > this perl.
> >> >>
> >> >> Hm, so the trivial s/// with '\n' in the replacement part is not
> >> >> portable, then?  Oh, well.
> >> >
> >> > I don't think it is, but I could be wrong. POSIX does say that "\n"
> >> > matches a newline in the pattern space, but nothing about it on the RHS
> >> > of a substitution. I have a vague feeling of running into problems in
> >> > the past, but I could just be misremembering.
> >> >
> >> > We also tried matching /^tree/ and using "a\" to append a line, but it
> >> > was both hard to read and hit portability issues with bsd sed.
> >
> > I think that all of this discussion is addressed elsewhere in thread, but
> > the gist is that Eric provided a suitable sed invocation that I am going
> > to use instead of Peff's Perl.
> >
> >> > -Peff
> >
> > Thanks,
> > Taylor

Thanks,
Taylor
diff mbox series

Patch

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
new file mode 100755
index 0000000000..472b08528a
--- /dev/null
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -0,0 +1,123 @@ 
+#!/bin/sh
+
+test_description='git rev-list should handle unexpected object types'
+
+. ./test-lib.sh
+
+test_expect_success 'setup well-formed objects' '
+	blob="$(printf "foo" | git hash-object -w --stdin)" &&
+	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
+	commit="$(git commit-tree $tree -m "first commit")"
+'
+
+test_expect_success 'setup unexpected non-blob entry' '
+	printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (seen)' '
+	test_must_fail git rev-list --objects $tree $broken_tree
+'
+
+test_expect_success 'setup unexpected non-tree entry' '
+	printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
+'
+
+test_expect_success 'setup unexpected non-commit parent' '
+	git cat-file commit $commit |
+		perl -lpe "/^author/ && print q(parent $blob)" \
+		>broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_success 'traverse unexpected non-commit parent (lone)' '
+	test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'traverse unexpected non-commit parent (seen)' '
+	test_must_fail git rev-list --objects $commit $broken_commit \
+		>output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree root' '
+	git cat-file commit $commit |
+	sed -e "s/$tree/$blob/" >broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_failure 'traverse unexpected non-tree root (lone)' '
+	test_must_fail git rev-list --objects $broken_commit
+'
+
+test_expect_failure 'traverse unexpected non-tree root (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_commit
+'
+
+test_expect_success 'setup unexpected non-commit tag' '
+	git tag -a -m "tagged commit" tag $commit &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-commit tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-commit tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree tag' '
+	git tag -a -m "tagged tree" tag $tree &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag |
+	sed -e "s/$tree/$blob/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-tree tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-tree tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a tree" output
+'
+
+test_expect_success 'setup unexpected non-blob tag' '
+	git tag -a -m "tagged blob" tag $blob &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag |
+	sed -e "s/$blob/$commit/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_failure 'traverse unexpected non-blob tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-blob tag (seen)' '
+	test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
+	test_i18ngrep "not a blob" output
+'
+
+test_done