[v2] bisect: don't use invalid oid as rev when starting

Message ID	20200924060344.15541-1-chriscool@tuxfamily.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=EnZj=DB=vger.kernel.org=git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <chriscool@tuxfamily.org>, Miriam Rubio <mirucam@gmail.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Johannes Schindelin <johannes.schindelin@gmx.de> Subject: [PATCH v2] bisect: don't use invalid oid as rev when starting Date: Thu, 24 Sep 2020 08:03:44 +0200 Message-Id: <20200924060344.15541-1-chriscool@tuxfamily.org> In-Reply-To: <20200923170915.21748-1-chriscool@tuxfamily.org> References: <20200923170915.21748-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] bisect: don't use invalid oid as rev when starting \| expand [v2] bisect: don't use invalid oid as rev when starting

Christian Couder Sept. 24, 2020, 6:03 a.m. UTC

In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:

-                       rev=$(git rev-parse -q --verify "$arg^{commit}") || {
-                               test $has_double_dash -eq 1 &&
-                               die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
-                               break
-                       }
-                       revs="$revs $rev"

into:

+                       char *commit_id = xstrfmt("%s^{commit}", arg);
+                       if (get_oid(commit_id, &oid) && has_double_dash)
+                               die(_("'%s' does not appear to be a valid "
+                                     "revision"), arg);
+
+                       string_list_append(&revs, oid_to_hex(&oid));
+                       free(commit_id);

In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.

In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.

Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
This patch is made on top of e1cfff6765 (Sixteenth batch, 2020-09-22)
and incorporates Dscho's suggestions.

Thanks to Junio and Dscho for reviewing the first version.

 builtin/bisect--helper.c    | 14 ++++++--------
 t/t6030-bisect-porcelain.sh |  7 +++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

Johannes Schindelin Sept. 24, 2020, 7:49 a.m. UTC | #1

Hi Christian,

On Thu, 24 Sep 2020, Christian Couder wrote:

> In 06f5608c14 (bisect--helper: `bisect_start` shell function
> partially in C, 2019-01-02), we changed the following shell
> code:
>
> -                       rev=$(git rev-parse -q --verify "$arg^{commit}") || {
> -                               test $has_double_dash -eq 1 &&
> -                               die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
> -                               break
> -                       }
> -                       revs="$revs $rev"

These are awfully long lines. The reason is that you kept the indentation
of the diff. But that's actually not necessary, because we do not need to
apply a diff here; This code snippet is intended purely for human
consumption.

What I suggested in my adaptation of your patch was to lose the diff
markers and to decrease the insane amount of indentation to just one (and
two) horizontal tabs.

> into:
>
> +                       char *commit_id = xstrfmt("%s^{commit}", arg);
> +                       if (get_oid(commit_id, &oid) && has_double_dash)
> +                               die(_("'%s' does not appear to be a valid "
> +                                     "revision"), arg);
> +
> +                       string_list_append(&revs, oid_to_hex(&oid));
> +                       free(commit_id);
>
> In case of an invalid "arg" when "has_double_dash" is false, the old
> code would "break" out of the argument loop.
>
> In the new C code though, `oid_to_hex(&oid)` is unconditonally
> appended to "revs". This is wrong first because "oid" is junk as
> `get_oid(commit_id, &oid)` failed and second because it doesn't break
> out of the argument loop.
>
> Not breaking out of the argument loop means that "arg" is then not
> treated as a path restriction.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> This patch is made on top of e1cfff6765 (Sixteenth batch, 2020-09-22)
> and incorporates Dscho's suggestions.
>
> Thanks to Junio and Dscho for reviewing the first version.
>
>  builtin/bisect--helper.c    | 14 ++++++--------
>  t/t6030-bisect-porcelain.sh |  7 +++++++
>  2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index 7dcc1b5188..f4762e1774 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -484,15 +484,13 @@ static int bisect_start(struct bisect_terms *terms, const char **argv, int argc)
>  			terms->term_bad = xstrdup(arg);
>  		} else if (starts_with(arg, "--")) {
>  			return error(_("unrecognized option: '%s'"), arg);
> -		} else {
> -			char *commit_id = xstrfmt("%s^{commit}", arg);
> -			if (get_oid(commit_id, &oid) && has_double_dash)
> -				die(_("'%s' does not appear to be a valid "
> -				      "revision"), arg);
> -
> +		} else if (!get_oid_committish(arg, &oid))
>  			string_list_append(&revs, oid_to_hex(&oid));
> -			free(commit_id);
> -		}
> +		else if (has_double_dash)
> +			die(_("'%s' does not appear to be a valid "
> +			      "revision"), arg);
> +		else
> +			break;

Thank you!

>  	}
>  	pathspec_pos = i;
>
> diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> index b886529e59..70c39a9459 100755
> --- a/t/t6030-bisect-porcelain.sh
> +++ b/t/t6030-bisect-porcelain.sh
> @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' '
>  	git bisect bad $HASH4
>  '
>
> +test_expect_success 'bisect start without -- uses unknown arg as path restriction' '

To avoid the overly long line (and also to re-use existing naming
conventions), I replaced "path restrictions" by "pathspecs" here. What do
you think?

Ciao,
Dscho

> +	git bisect reset &&
> +	git bisect start foo bar &&
> +	grep foo ".git/BISECT_NAMES" &&
> +	grep bar ".git/BISECT_NAMES"
> +'
> +
>  test_expect_success 'bisect reset: back in the master branch' '
>  	git bisect reset &&
>  	echo "* master" > branch.expect &&
> --
> 2.28.0.585.ge1cfff6765
>
>

Christian Couder Sept. 24, 2020, 11:08 a.m. UTC | #2

Hi Dscho,

On Thu, Sep 24, 2020 at 11:59 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> On Thu, 24 Sep 2020, Christian Couder wrote:

> > -                       rev=$(git rev-parse -q --verify "$arg^{commit}") || {
> > -                               test $has_double_dash -eq 1 &&
> > -                               die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
> > -                               break
> > -                       }
> > -                       revs="$revs $rev"
>
> These are awfully long lines. The reason is that you kept the indentation
> of the diff. But that's actually not necessary, because we do not need to
> apply a diff here; This code snippet is intended purely for human
> consumption.
>
> What I suggested in my adaptation of your patch was to lose the diff
> markers and to decrease the insane amount of indentation to just one (and
> two) horizontal tabs.

Yeah, I didn't realize that.

When I am sent some code or patch like this, I often hesitate between:

- using it verbatim, which can create issues as it makes me more
likely to overlook something in the case the sender didn't fully check
everything
- looking at the differences with the existing code/patch and applying
them one by one, which has the risk of missing or forgetting a
difference

I guess the best would be to do both and then check the differences
between the 2 results, but it feels like twice the amount of work for
this step.

> > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> > index b886529e59..70c39a9459 100755
> > --- a/t/t6030-bisect-porcelain.sh
> > +++ b/t/t6030-bisect-porcelain.sh
> > @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' '
> >       git bisect bad $HASH4
> >  '
> >
> > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' '
>
> To avoid the overly long line (and also to re-use existing naming
> conventions), I replaced "path restrictions" by "pathspecs" here. What do
> you think?

It's not a huge issue, but I tend to prefer using "restrictions"
because the tests that check that these arguments are used properly
are called "restricting bisection on one dir" and "restricting
bisection on one dir and a file". So I feel that it keeps test names
more coherent.

Best,
Christian.

Junio C Hamano Sept. 24, 2020, 4:44 p.m. UTC | #3

Christian Couder <christian.couder@gmail.com> writes:

>> > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' '
>>
>> To avoid the overly long line (and also to re-use existing naming
>> conventions), I replaced "path restrictions" by "pathspecs" here. What do
>> you think?
>
> It's not a huge issue, but I tend to prefer using "restrictions"
> because the tests that check that these arguments are used properly
> are called "restricting bisection on one dir" and "restricting
> bisection on one dir and a file". So I feel that it keeps test names
> more coherent.

Hmph, but in the context of a sentence "use an arg as X", we should
try to pick a well-known word to describe various Git arguments for
X, no?  The one you are using, in order to filter the set of commits
involved in the operation to those that touch specific set of paths,
already has its own established name and the word is "pathspec".

So...?

Junio C Hamano Sept. 24, 2020, 6:55 p.m. UTC | #4

Christian Couder <christian.couder@gmail.com> writes:

> -		} else {
> -			char *commit_id = xstrfmt("%s^{commit}", arg);
> -			if (get_oid(commit_id, &oid) && has_double_dash)
> -				die(_("'%s' does not appear to be a valid "
> -				      "revision"), arg);
> -
> +		} else if (!get_oid_committish(arg, &oid))

This is wrong, I think.  The "_committish" only applies to the fact
that the disambiguation includes only the objects that are
committishes, and the result are not peeled---you'll get an
annotated tag back in oid, if you give it an annotated tag that
points at a commit.

This is not what ^{commit} does.

It may happen to work if the downstream consumers peel objects
(which are appended to the 'revs' here) down to commit when they are
used, and if somebody verified that is indeed the case, it would be
OK, but if this patch is presented as "unlike the previous broken
one, this is the faithful conversion of the original in shell, so
there is no need to verify anything outside of the patch context",
that is a problem.

We may need to audit recent additions of get_oid_committish() calls
in our codebase.  I suspect there may be other instances of the same
mistake.

Other than that, the code structure does look more straight-forward
compared to the previous round.  A fix on this version would involve
peeling what is in oid down to commit, and you need to handle errors
during that process, so it may not stay pretty with a fix, though.
I dunno.

Thanks.

>  			string_list_append(&revs, oid_to_hex(&oid));
> -			free(commit_id);
> -		}
> +		else if (has_double_dash)
> +			die(_("'%s' does not appear to be a valid "
> +			      "revision"), arg);
> +		else
> +			break;
>  	}
>  	pathspec_pos = i;
>  
> diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> index b886529e59..70c39a9459 100755
> --- a/t/t6030-bisect-porcelain.sh
> +++ b/t/t6030-bisect-porcelain.sh
> @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' '
>  	git bisect bad $HASH4
>  '
>  
> +test_expect_success 'bisect start without -- uses unknown arg as path restriction' '
> +	git bisect reset &&
> +	git bisect start foo bar &&
> +	grep foo ".git/BISECT_NAMES" &&
> +	grep bar ".git/BISECT_NAMES"
> +'
> +
>  test_expect_success 'bisect reset: back in the master branch' '
>  	git bisect reset &&
>  	echo "* master" > branch.expect &&

Junio C Hamano Sept. 24, 2020, 7:25 p.m. UTC | #5

Junio C Hamano <gitster@pobox.com> writes:

> We may need to audit recent additions of get_oid_committish() calls
> in our codebase.  I suspect there may be other instances of the same
> mistake.
>
> Other than that, the code structure does look more straight-forward
> compared to the previous round.  A fix on this version would involve
> peeling what is in oid down to commit, and you need to handle errors
> during that process, so it may not stay pretty with a fix, though.
> I dunno.

I'll queue it with this band-aid on top for now.

Thanks.

 builtin/bisect--helper.c    | 7 ++++---
 t/t6030-bisect-porcelain.sh | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index a1f97e3f6c..2fcc023a3b 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -474,13 +474,14 @@ static int bisect_start(struct bisect_terms *terms, int no_checkout,
 		} else if (starts_with(arg, "--") &&
 			 !one_of(arg, "--term-good", "--term-bad", NULL)) {
 			return error(_("unrecognized option: '%s'"), arg);
-		} else if (!get_oid_committish(arg, &oid))
+		} else if (!get_oidf(&oid, "%s^{commit}", arg)) {
 			string_list_append(&revs, oid_to_hex(&oid));
-		else if (has_double_dash)
+		} else if (has_double_dash) {
 			die(_("'%s' does not appear to be a valid "
 			      "revision"), arg);
-		else
+		} else {
 			break;
+		}
 	}
 	pathspec_pos = i;
 
diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 94179b6acf..4dbfa63ca1 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -82,7 +82,7 @@ test_expect_success 'bisect fails if given any junk instead of revs' '
 	git bisect bad $HASH4
 '
 
-test_expect_success 'bisect start without -- uses unknown arg as path restriction' '
+test_expect_success 'bisect start without -- takes unknown arg as pathspec' '
 	git bisect reset &&
 	git bisect start foo bar &&
 	grep foo ".git/BISECT_NAMES" &&

Junio C Hamano Sept. 24, 2020, 7:56 p.m. UTC | #6

Junio C Hamano <gitster@pobox.com> writes:

> We may need to audit recent additions of get_oid_committish() calls
> in our codebase.  I suspect there may be other instances of the same
> mistake.

Interim progress report.  I've only looked at files that use
get_oid_treeish() but audited all uses of get_oid_*ish() in them.

The results are as follows.

 * builtin/reset.c::parse_args() makes get_oid_committish() and
   get_oid_treeish() only to discard the object name, because it
   wants to ensure the args can be peeled down to such.  OK.

 * builtin/reset.c::cmd_reset() applies get_oid_committish() and
   get_oid_treeish() to the result taken from the above, but then
   uses lookup_commit_reference() and parse_tree_indirect() to peel
   the result to the desired type.  OK.

 * notes.c::init_notes() uses get_oid_treeish() to validate that the
   notes ref can be read as a tree, and then uses get_tree_entry()
   on it, which in turn uses read_object_with_reference() for
   tree_type so it tolerates a commit object.  OK.

I didn't audit the following hits of get_oid_committish().  There
might be a similar mistake as you made in v2, or there may not be.

I am undecided if I should just move on, marking them as
left-over-bits ;-)

builtin/blame.c:		if (get_oid_committish(i->string, &oid))
builtin/checkout.c:		repo_get_oid_committish(the_repository, branch->name, &branch->oid);
builtin/rev-parse.c:	if (!get_oid_committish(start, &start_oid) && !get_oid_committish(end, &end_oid)) {
builtin/rev-parse.c:	if (get_oid_committish(arg, &oid) ||
commit.c:	if (get_oid_committish(name, &oid))
revision.c:	if (get_oid_committish(arg, &oid))
sequencer.c:		    !get_oid_committish(buf.buf, &oid))
sha1-name.c:		st = repo_get_oid_committish(r, sb.buf, &oid_tmp);
sha1-name.c:	if (repo_get_oid_committish(r, dots[3] ? (dots + 3) : "HEAD", &oid_tmp))
sha1-name.c:int repo_get_oid_committish(struct repository *r,
t/helper/test-reach.c:		if (get_oid_committish(buf.buf + 2, &oid))

Junio C Hamano Sept. 24, 2020, 8:53 p.m. UTC | #7

Junio C Hamano <gitster@pobox.com> writes:

> I didn't audit the following hits of get_oid_committish().  There
> might be a similar mistake as you made in v2, or there may not be.
>
> I am undecided if I should just move on, marking them as
> left-over-bits ;-)
>
>
>
> builtin/blame.c:		if (get_oid_committish(i->string, &oid))

This one throws the object name of revs to be skipped to a list, and
because revision traversal works on commit objects, if the user
gives an annotated tag and expects the underlying commit is ignored,
it may appear as a bug.  But in the same function a list of revs to
be ignored is read from file using oidset_parse_file() that in turn
uses parse_oid_hex() without even validating if the named object
exists, I would say it is OK---after all, if it hurts, the user can
refrain from doing so ;-)

But it would be nice to fix all issues around this caller.  After
collecting the object names to an oidset, somebody should go through
the list, peel them down to commit and make sure they exist, or
something like that.  A possible #leftoverbits.

> builtin/checkout.c:		repo_get_oid_committish(the_repository, branch->name, &branch->oid);

This one is probably OK as branch refs are supposed to point at
commits and not annotated tags that point at commits.

> builtin/rev-parse.c:	if (!get_oid_committish(start, &start_oid) && !get_oid_committish(end, &end_oid)) {

This one handles "rev-parse v1.0..v2.0" which gives "^v1.0 v2.0" but
using the (unpeeled) object name.  It is fine and should not be
changed to auto-peel.

> builtin/rev-parse.c:	if (get_oid_committish(arg, &oid) ||

This is immediately followed by lookup_commit_reference() to peel as
needed.  OK.

> commit.c:	if (get_oid_committish(name, &oid))

This is part of lookup_commit_reference_by_name(), which peels and
parses it down to an in-core commit object instance.  OK.

> revision.c:	if (get_oid_committish(arg, &oid))

This is followed by a loop to peel it as needed.  OK.

> sequencer.c:		    !get_oid_committish(buf.buf, &oid))

This feeds the contents of rebase-merge/stopped-sha file.  I presume
that the contents of this file (which is not directly shown to the
end users) is always a commit object name, so this is OK.  Use of
_committish() may probably be overkill for this internal bookkeeping
file.  If we stop make_patch() from shortening then probably we can
change it to parse_oid_hex() to expect and read the full object
name.

> sha1-name.c:		st = repo_get_oid_committish(r, sb.buf, &oid_tmp);
> sha1-name.c:	if (repo_get_oid_committish(r, dots[3] ? (dots + 3) : "HEAD", &oid_tmp))

Since I know those who wrote this old part of the codebase knew what
they were doing, I do not have to comment, but these are fine.  They
are all peeled to commit as appropriate by calling
lookup_commit_reference_gently() before feeding the result to
get_merge_bases().

> sha1-name.c:int repo_get_oid_committish(struct repository *r,

This is the implementation ;-)

> t/helper/test-reach.c:		if (get_oid_committish(buf.buf + 2, &oid))

This peels afterwards, so it is OK.

The true reason I went through all the callers was to see if _all_
the callers want to either ignore the resulting object name (i.e.
they want to make sure that the arg given can be peeled down to an
appropriate type) or wants the object name to be peeled to the type.
If that were the case (and from the above, it clearly isn't), we
could change the semantics of get_oid_*ish() so that the resulting
oid is already peeled down to the wanted type and that could simplify
the current callers that are peeling the result themselves.

But because some callers do not want to see the result peeled, we
shouldn't touch what get_oid_*ish() functions do.

Christian Couder Sept. 25, 2020, 1:09 p.m. UTC | #8

On Thu, Sep 24, 2020 at 8:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > -             } else {
> > -                     char *commit_id = xstrfmt("%s^{commit}", arg);
> > -                     if (get_oid(commit_id, &oid) && has_double_dash)
> > -                             die(_("'%s' does not appear to be a valid "
> > -                                   "revision"), arg);
> > -
> > +             } else if (!get_oid_committish(arg, &oid))
>
> This is wrong, I think.  The "_committish" only applies to the fact
> that the disambiguation includes only the objects that are
> committishes, and the result are not peeled---you'll get an
> annotated tag back in oid, if you give it an annotated tag that
> points at a commit.
>
> This is not what ^{commit} does.

Thanks for finding this.

> It may happen to work if the downstream consumers peel objects
> (which are appended to the 'revs' here) down to commit when they are
> used, and if somebody verified that is indeed the case, it would be
> OK, but if this patch is presented as "unlike the previous broken
> one, this is the faithful conversion of the original in shell, so
> there is no need to verify anything outside of the patch context",
> that is a problem.

I agree that it's better if there is no need to verify anything
outside of the patch context. So I agree with your fix.

I am also ok with using "pathspec" in the test description of the new test.

Thanks,
Christian.

[v2] bisect: don't use invalid oid as rev when starting

Commit Message

Comments

Patch