describe: output tag's ref instead of embedded name
diff mbox series

Message ID fcf19a46b80322c5579142efe4ec681a4dcbdd28.1581802264.git.matheus.bernardino@usp.br
State New
Headers show
Series
  • describe: output tag's ref instead of embedded name
Related show

Commit Message

Matheus Tavares Bernardino Feb. 15, 2020, 9:34 p.m. UTC
If a tag describes a commit, we currently output not the tag's ref but
its embedded name. This means that when the tag is locally stored under
a different name, the output given cannot be used to access the tag in
any way. A warning is also emitted in this case, but the message is not
very enlightening:

$ git tag -am "testing tag body" testing-tag
$ mv .git/refs/tags/testing-tag .git/refs/tags/testing-tag-with-new-name
$ git describe --tags --abbrev=0
warning: tag 'testing-tag' is really 'testing-tag-with-new-name' here
testing-tag

Let's make git-describe output the tag's local name instead and
rephrase the warning to reflect the situation a little better.

Since the embedded name will no longer be needed for correct output, we
can convert the die() call in append_name() when a tag doesn't have the
embedded name to a warning(). Also, this function will now have two
disconnected responsibilities: verifying if the tag's embedded name
matches the ref and actually appending the ref to a given buffer (which
does not depend on the parsed tag object itself). Thus, to increase
intelligibility, let's make the function specialize in the former and do
the latter outside it.

Suggested-by: Jeff King <peff@peff.net>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/describe.c  | 23 ++++++++---------------
 t/t6120-describe.sh |  4 ++--
 2 files changed, 10 insertions(+), 17 deletions(-)

Comments

Jeff King Feb. 16, 2020, 6:51 a.m. UTC | #1
On Sat, Feb 15, 2020 at 06:34:13PM -0300, Matheus Tavares wrote:

> If a tag describes a commit, we currently output not the tag's ref but
> its embedded name. This means that when the tag is locally stored under
> a different name, the output given cannot be used to access the tag in
> any way. A warning is also emitted in this case, but the message is not
> very enlightening:
> 
> $ git tag -am "testing tag body" testing-tag
> $ mv .git/refs/tags/testing-tag .git/refs/tags/testing-tag-with-new-name
> $ git describe --tags --abbrev=0
> warning: tag 'testing-tag' is really 'testing-tag-with-new-name' here
> testing-tag
> 
> Let's make git-describe output the tag's local name instead and
> rephrase the warning to reflect the situation a little better.

Thanks. I had this on my "eh, we should probably do it before we forget"
pile, so I was quite happy to see that somebody had already handled it. :)

I think the conversion of the die() to warning() makes sense here. Do we
want to cover that with a test?

> Since the embedded name will no longer be needed for correct output, we
> can convert the die() call in append_name() when a tag doesn't have the
> embedded name to a warning(). Also, this function will now have two
> disconnected responsibilities: verifying if the tag's embedded name
> matches the ref and actually appending the ref to a given buffer (which
> does not depend on the parsed tag object itself). Thus, to increase
> intelligibility, let's make the function specialize in the former and do
> the latter outside it.

Pulling the buffer append out of the function makes a lot of sense as
well. I was puzzled at first that this old code:

> -
> -	if (n->tag) {
> -		if (all)
> -			strbuf_addstr(dst, "tags/");
> -		strbuf_addstr(dst, n->tag->tag);
> -	} else {
> -		strbuf_addstr(dst, n->path);
> -	}
>  }

...used "tags/" for "--all", but the new code doesn't:

> @@ -313,7 +305,8 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
>  		/*
>  		 * Exact match to an existing ref.
>  		 */
> -		append_name(n, dst);
> +		verify_tag_embedded_name(n);
> +		strbuf_addstr(dst, n->path);

But that makes sense. As the else clause in the old code shows, our
n->path is already set up correctly. So not only are we splitting out
the "append" concern, but we're able to make it simpler. Good.

> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 09c50f3f04..d34c091e0b 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -129,9 +129,9 @@ test_expect_success 'rename tag A to Q locally' '
>  	mv .git/refs/tags/A .git/refs/tags/Q
>  '
>  cat - >err.expect <<EOF
> -warning: tag 'A' is really 'Q' here
> +warning: tag 'Q' is externally known as 'A'
>  EOF
> -check_describe A-* HEAD
> +check_describe Q-* HEAD
>  test_expect_success 'warning was displayed for Q' '
>  	test_i18ncmp err.expect err.actual

And this test change makes sense. I had to look up how check_describe
works, but that first argument is the expected output.

So the whole thing looks good to me, with or without the die/warning
test I mentioned above.

-Peff
Junio C Hamano Feb. 18, 2020, 7:31 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> I think the conversion of the die() to warning() makes sense here. Do we
> want to cover that with a test?

I presume that you are talking about this case.

>  	if (n->tag && !n->name_checked) {
>  		if (!n->tag->tag)
> -			die(_("annotated tag %s has no embedded name"), n->path);
> +			warning(_("annotated tag %s has no embedded name"), n->path);

The attached is my attempt to craft such a test.  It turns out that
it is tricky to trigger this die/warning.  I haven't dug deeply
enough, but I suspect this might be a dead code now.

A few curious points about the attached *test* that does not work.

 * "git tag -a" cannot create a tag without tagname.  We need to
   resort to "git hash-object -t tag".

 * "git hash-object -t tag" has internal consistency check, and
   rejects input that lack the tagname.  We need to resort to the
   --literally option.

 * Instead of "cat U.objname >.git/refs/tags/U", the method that is
   agnostic to the ref backend implementation, such as "git tag U
   $(cat U.objname)" or "git update-ref refs/tags/U $(cat
   U.objname)" should be usable and should be used.  But they
   complain that the object whose name is $(cat U.objname) does
   *NOT* exist.  I suspect we are triggering an error return from
   parse_tag_buffer() and parse_tag(), which tells
   parse_object_buffer() to return NULL, which in turn means there
   is no such object to its caller.

And of course, "check_describe U A^1" does not even see "U" and does
not complain.  I think that happens way before this part of the
code.  add_to_known_names() calls replace_name() that in turn calls
parse_tag() and a malformed annotated tag there won't even become
a candidate to describe the given commit, I think.

So, we might want to revisit this, analyze what happens fully, and
replace it die/warning with a BUG(), if it turns out to be a dead
code.

 t/t6120-describe.sh | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 960fd99bb1..7544278782 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -6,7 +6,7 @@ test_description='test describe
         .--------------o----o----o----x
        /                   /    /
  o----o----o----o----o----.    /
-       \        A    c        /
+       \   U    A    c        /
         .------------o---o---o
                    D,R   e
 '
@@ -140,6 +140,20 @@ test_expect_success 'rename tag Q back to A' '
 	mv .git/refs/tags/Q .git/refs/tags/A
 '
 
+test_expect_success 'warn for annotated tag without tagname' '
+	git cat-file tag A >A.txt &&
+	sed -e "s/object .*/object $(git rev-parse A^1)/" \
+	    -e "/^tag A/d" A.txt |
+	git hash-object -w --literally --stdin -t tag >U.objname &&
+	cat U.objname >.git/refs/tags/U &&
+
+	check_describe U A^1 &&
+	cat >err.expect <<-\EOF &&
+	warning: annotated tag U has no embedded name
+	EOF
+	test_i18ncmp err.expect err.actual
+'
+
 test_expect_success 'pack tag refs' 'git pack-refs'
 check_describe A-* HEAD
Jeff King Feb. 18, 2020, 7:54 p.m. UTC | #3
On Tue, Feb 18, 2020 at 11:31:35AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think the conversion of the die() to warning() makes sense here. Do we
> > want to cover that with a test?
> 
> I presume that you are talking about this case.
> 
> >  	if (n->tag && !n->name_checked) {
> >  		if (!n->tag->tag)
> > -			die(_("annotated tag %s has no embedded name"), n->path);
> > +			warning(_("annotated tag %s has no embedded name"), n->path);

Yep, that's the one.

> The attached is my attempt to craft such a test.  It turns out that
> it is tricky to trigger this die/warning.  I haven't dug deeply
> enough, but I suspect this might be a dead code now.

Thanks for digging so deeply on this. I think you're right that we'll
never get a "struct tag" with a NULL tag field because
parse_tag_buffer() unconditionally puts something there or returns an
error.

We probably used to be able to trigger this by "double parsing" the tag
(probably with two refs pointing at the same tag object), in which case
the first would mark it as parsed but return an error, and the second
would quietly return "oh, it's already parsed". But I fixed that
recently in 228c78fbd4 (commit, tag: don't set parsed bit for parse
failures, 2019-10-25).

So yeah, I think I'd be OK to turn this into a BUG(), or just eliminate
it entirely (we'd segfault, which is BUG-equivalent ;) ).

-Peff
Junio C Hamano Feb. 18, 2020, 11:05 p.m. UTC | #4
Coming back to the original topic, ...

> If a tag describes a commit, we currently output not the tag's ref but
> its embedded name. This means that when the tag is locally stored under
> a different name, the output given cannot be used to access the tag in
> any way. A warning is also emitted in this case, but the message is not
> very enlightening:

None of the above is wrong per-se, but the reason why we chose to
use the real name of the tag in the tag object, while issuing a
warning, was so that people can correct the mistake of storing an
annotated tag A to a wrong refname Q.  If "describe" gave a name
based on refname Q, there is no incentive to correct the situation
to use the right refname.  The name that describes the commits
relative to the real name of the A is *not* usable by design, until
the refname is corrected (i.e. the tag is stored in the right
place).

So I am not 100% confident that the original patch is a good idea.
Junio C Hamano Feb. 18, 2020, 11:28 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Coming back to the original topic, ...
>
>> If a tag describes a commit, we currently output not the tag's ref but
>> its embedded name. This means that when the tag is locally stored under
>> a different name, the output given cannot be used to access the tag in
>> any way. A warning is also emitted in this case, but the message is not
>> very enlightening:
>
> None of the above is wrong per-se, but the reason why we chose to
> use the real name of the tag in the tag object, while issuing a
> warning, was so that people can correct the mistake of storing an
> annotated tag A to a wrong refname Q.  If "describe" gave a name
> based on refname Q, there is no incentive to correct the situation
> to use the right refname.  The name that describes the commits
> relative to the real name of the A is *not* usable by design, until
> the refname is corrected (i.e. the tag is stored in the right
> place).
>
> So I am not 100% confident that the original patch is a good idea.

FWIW, this design came from 212945d4 ("Teach git-describe to verify
annotated tag names before output", 2008-02-28).  Shawn was quite
explicit that use of the real name was deliberate:

    If an annotated tag describes a commit we want to favor the name
    listed in the body of the tag, rather than whatever name it has
    been stored under locally.  By doing so it is easier to converse
    about tags with others, even if the tags happen to be fetched to
    a different name than it was given by its creator.

and I tend to agree with the original rationale.
Jeff King Feb. 19, 2020, 1:57 a.m. UTC | #6
On Tue, Feb 18, 2020 at 03:28:24PM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Coming back to the original topic, ...
> >
> >> If a tag describes a commit, we currently output not the tag's ref but
> >> its embedded name. This means that when the tag is locally stored under
> >> a different name, the output given cannot be used to access the tag in
> >> any way. A warning is also emitted in this case, but the message is not
> >> very enlightening:
> >
> > None of the above is wrong per-se, but the reason why we chose to
> > use the real name of the tag in the tag object, while issuing a
> > warning, was so that people can correct the mistake of storing an
> > annotated tag A to a wrong refname Q.  If "describe" gave a name
> > based on refname Q, there is no incentive to correct the situation
> > to use the right refname.  The name that describes the commits
> > relative to the real name of the A is *not* usable by design, until
> > the refname is corrected (i.e. the tag is stored in the right
> > place).
> >
> > So I am not 100% confident that the original patch is a good idea.
> 
> FWIW, this design came from 212945d4 ("Teach git-describe to verify
> annotated tag names before output", 2008-02-28).  Shawn was quite
> explicit that use of the real name was deliberate:
> 
>     If an annotated tag describes a commit we want to favor the name
>     listed in the body of the tag, rather than whatever name it has
>     been stored under locally.  By doing so it is easier to converse
>     about tags with others, even if the tags happen to be fetched to
>     a different name than it was given by its creator.
> 
> and I tend to agree with the original rationale.

Thanks, I should have dug into the history in the first place.

Still, I'm not entirely convinced. As a decentralized system, I think
our first duty is to make things convenient and workable for the
preferences of the local repository, and second to facilitate
communication with other people's clones of the same repository.

If for whatever reason I chose to call my version of the global v1.0 tag
as "v1.0-bob", then it seems friendlier to me to report the name that
can actually be used with further local commands (and remind the user of
the global name) than the other way around.

Though TBH the situation is rare enough that I kind of doubt it matters
all that much either way. It's been like this for over a decade, and
this is the first time I recall it being brought up.

-Peff
Junio C Hamano Feb. 19, 2020, 3:22 a.m. UTC | #7
Jeff King <peff@peff.net> writes:

>> FWIW, this design came from 212945d4 ("Teach git-describe to verify
>> annotated tag names before output", 2008-02-28).  Shawn was quite
>> explicit that use of the real name was deliberate:
>> 
>>     If an annotated tag describes a commit we want to favor the name
>>     listed in the body of the tag, rather than whatever name it has
>>     been stored under locally.  By doing so it is easier to converse
>>     about tags with others, even if the tags happen to be fetched to
>>     a different name than it was given by its creator.
>> 
>> and I tend to agree with the original rationale.
>
> Thanks, I should have dug into the history in the first place.
>
> Still, I'm not entirely convinced. As a decentralized system, I think
> our first duty is to make things convenient and workable for the
> preferences of the local repository, and second to facilitate
> communication with other people's clones of the same repository.

Yes, and that can be done by either (1) locally moving a tag that is
stored in a wrong location to where it wants to be, or (2) locally
*creating* a tag that suits the preferences of the local repository,
ignoring the tag obtained from outside world that is stored in a
wrong place.  The latter would not help to facilitate communication,
though.

> If for whatever reason I chose to call my version of the global v1.0 tag
> as "v1.0-bob", then it seems friendlier to me to report the name that
> can actually be used with further local commands (and remind the user of
> the global name) than the other way around.

That you can do with "git tag v1.0-bob <whatever object>" locally, no?

> Though TBH the situation is rare enough that I kind of doubt it matters
> all that much either way. It's been like this for over a decade, and
> this is the first time I recall it being brought up.

Yeah, I do not think this is an often-arising concern.  It's merely
what the expected and recommended direction to escape when it
happens, and what the warning message should say to make the
recommendation communicated better, I think.

Note that I started this to play a devil's advocate.  As an object
is immutable, while it can be named with any refname, if easing
communication between project participants is one of the goals, it
seems that taking what is in the object as authoritative is the only
workable way.
Jeff King Feb. 19, 2020, 3:56 a.m. UTC | #8
On Tue, Feb 18, 2020 at 07:22:02PM -0800, Junio C Hamano wrote:

> > Still, I'm not entirely convinced. As a decentralized system, I think
> > our first duty is to make things convenient and workable for the
> > preferences of the local repository, and second to facilitate
> > communication with other people's clones of the same repository.
> 
> Yes, and that can be done by either (1) locally moving a tag that is
> stored in a wrong location to where it wants to be, or (2) locally
> *creating* a tag that suits the preferences of the local repository,
> ignoring the tag obtained from outside world that is stored in a
> wrong place.  The latter would not help to facilitate communication,
> though.

I think a left a few things unsaid in my "v1.0-bob" example. I was
imagining that there are _two_ v1.0 tags floating around. One that you
consider "wrong", tagged by Bob, and one you like. You keep the latter
in refs/tags/v1.0.

I suppose that technically is your (1), as you renamed Bob's wrong tag
locally.

I think this is a steady-state that makes sense. Bob erroneously called
his tag "v1.0", but you want to hold on to it for talking to Bob, or
understanding Bob's world-view.

Now imagine you run "git describe", and it says:

  $ git describe HEAD
  warning: tag 'v1.0' is really 'v1.0-bob' here
  v1.0

Does that mean we found "v1.0-bob"? Or did we pass through looking at
Bob's v1.0 and eventually found our own "real" v1.0? (I'm actually not
sure if the "pass through" case would even generate such a warning).

But it seems like saying "we found v1.0-bob" unambiguously would make
the output more useful. Another way of thinking about it is that you can
only have one ref namespace locally, and therefore only one
refs/tags/foo; but you can have an infinite number of tags that claim to
be "foo".

> > If for whatever reason I chose to call my version of the global v1.0 tag
> > as "v1.0-bob", then it seems friendlier to me to report the name that
> > can actually be used with further local commands (and remind the user of
> > the global name) than the other way around.
> 
> That you can do with "git tag v1.0-bob <whatever object>" locally, no?

Not if you already have something else there, as above.

> > Though TBH the situation is rare enough that I kind of doubt it matters
> > all that much either way. It's been like this for over a decade, and
> > this is the first time I recall it being brought up.
> 
> Yeah, I do not think this is an often-arising concern.  It's merely
> what the expected and recommended direction to escape when it
> happens, and what the warning message should say to make the
> recommendation communicated better, I think.

Yeah, I'll be the first to admit that the example above is totally
contrived. I've never run into it myself.  But then, I've never seen
this warning in the wild either.  The usual resolution, I imagine, would
be "oops, I was trying to do some tricky with tags and screwed it up;
let me fix the name". And if we are going to give any advise() hint, I
think it would be that (and people who really do want a long-running
v1.0-bob would presumably want to disable that advice).

-Peff
Roland Hieber Feb. 19, 2020, 10:08 a.m. UTC | #9
On Tue, Feb 18, 2020 at 07:22:02PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> >> FWIW, this design came from 212945d4 ("Teach git-describe to verify
> >> annotated tag names before output", 2008-02-28).  Shawn was quite
> >> explicit that use of the real name was deliberate:
> >> 
> >>     If an annotated tag describes a commit we want to favor the name
> >>     listed in the body of the tag, rather than whatever name it has
> >>     been stored under locally.  By doing so it is easier to converse
> >>     about tags with others, even if the tags happen to be fetched to
> >>     a different name than it was given by its creator.
> >> 
> >> and I tend to agree with the original rationale.
> >
> > Thanks, I should have dug into the history in the first place.
> >
> > Still, I'm not entirely convinced. As a decentralized system, I think
> > our first duty is to make things convenient and workable for the
> > preferences of the local repository, and second to facilitate
> > communication with other people's clones of the same repository.
> 
> Yes, and that can be done by either (1) locally moving a tag that is
> stored in a wrong location to where it wants to be, or (2) locally
> *creating* a tag that suits the preferences of the local repository,
> ignoring the tag obtained from outside world that is stored in a
> wrong place.  The latter would not help to facilitate communication,
> though.

FWIW, I think the reason here was a move from SVN to git by the help of
git-svn, and in that process deciding that tag names should follow a
different (less out-dated?) format.

 - Roland
Junio C Hamano Feb. 19, 2020, 11:14 a.m. UTC | #10
Jeff King <peff@peff.net> writes:

> I think a left a few things unsaid in my "v1.0-bob" example. I was
> imagining that there are _two_ v1.0 tags floating around. One that you
> consider "wrong", tagged by Bob, and one you like. You keep the latter
> in refs/tags/v1.0.

Ahh, OK.  

To continue playing devil's advocate and to step back a bit,

 - The "git describe" command finds that the given commit is
   "closest" to that tag Bob called "v1.0".

 - But if it outputs "v1.0" like the current code does, it cannot be
   fed back to get_oid() to name the right object, if the given commit
   is "at" the tag (i.e. there is no "-$number-g$objectname" suffix),
   which is a problem.  We want "git describe" to give an output
   usable with get_oid() and the name must refer to the correct
   object (i.e. the one given to "git describe" as an input).

There are multiple approaches to make it possible to feed the output
back to get_oid() to obtain correct result:

 - We can describe it based on "v1.0-bob" (i.e. refname, not
   tagname), which is what the proposed patch wants to do.  

   This is nice as it simply exploits the fact that namespace of the
   refs ensures that there can locally be only one tag "v1.0".

 - We can always add "-$number-g$objectname" suffix when we need to
   base the output of "git describe" on such a tag whose name does
   not match its refname.  Both names "v1.0-bob-0-g0123456" and
   "v1.0-0-g0123456" would be accepted by get_oid() and refer to the
   same object 0123456.  

   If we do this, there is no longer any requirement that the
   "tagname" part alone of the output must be usable with get_oid()
   to name the correct object, and because "v1.0-bob" is merely a
   local name external people may not even know about, using
   "v1.0-bob-0-g0123456" may be less desirable than using
   "v1.0-0-g0123456".

 - We can skip the "wrong" tag and not use it to name anything.
   "git describe" may use some other tag that is stored in its right
   place, which may be a bit more distant than tag "v1.0-bob".  The
   warning message may indicate "we would have used this tag to name
   the object because it was the closest, but we skipped because of
   a tagname anomaly".  

   This may be what the person who moved Bob's "v1.0" to "v1.0-bob"
   intended---it is a "wrong" tag that is kept only to communicate
   _about_ the tag, not to be used to refer to other objects
   relative to it.

The "use the refname" and the "use -0-gXXXXXX suffix if using a tag
at a wrong place" approaches are simple and would produce a more
"correct" result than what we currently give.  The "don't use it"
approach simply sidesteps the problem altogether, which is sort-of
attractive ;-)
Jeff King Feb. 20, 2020, 11:25 a.m. UTC | #11
On Wed, Feb 19, 2020 at 03:14:14AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think a left a few things unsaid in my "v1.0-bob" example. I was
> > imagining that there are _two_ v1.0 tags floating around. One that you
> > consider "wrong", tagged by Bob, and one you like. You keep the latter
> > in refs/tags/v1.0.
> 
> Ahh, OK.  
> 
> To continue playing devil's advocate and to step back a bit,
> 
>  - The "git describe" command finds that the given commit is
>    "closest" to that tag Bob called "v1.0".
> 
>  - But if it outputs "v1.0" like the current code does, it cannot be
>    fed back to get_oid() to name the right object, if the given commit
>    is "at" the tag (i.e. there is no "-$number-g$objectname" suffix),
>    which is a problem.  We want "git describe" to give an output
>    usable with get_oid() and the name must refer to the correct
>    object (i.e. the one given to "git describe" as an input).
> 
> There are multiple approaches to make it possible to feed the output
> back to get_oid() to obtain correct result:
> [...]

Thanks for clearly laying out your thinking. All of what you wrote makes
sense to me and I'd be OK with any of the options you described.

The "-g$objectname" one is kind of clever, and definitely not something
I had thought of. We already have "--long", and of course we'd show the
long version for any name that isn't an exact match anyway. So as an
added bonus, it seems unlikely to surprise anybody who is expecting the
current "show the tag, not the refname" output (though again, this is
rare enough that I think people simply expect them to be the same ;) ).

-Peff
Junio C Hamano Feb. 20, 2020, 5:34 p.m. UTC | #12
Jeff King <peff@peff.net> writes:

> On Wed, Feb 19, 2020 at 03:14:14AM -0800, Junio C Hamano wrote:
>
>> To continue playing devil's advocate and to step back a bit,
>> 
>>  - The "git describe" command finds that the given commit is
>>    "closest" to that tag Bob called "v1.0".
>> 
>>  - But if it outputs "v1.0" like the current code does, it cannot be
>>    fed back to get_oid() to name the right object, if the given commit
>>    is "at" the tag (i.e. there is no "-$number-g$objectname" suffix),
>>    which is a problem.  We want "git describe" to give an output
>>    usable with get_oid() and the name must refer to the correct
>>    object (i.e. the one given to "git describe" as an input).
>> 
>> There are multiple approaches to make it possible to feed the output
>> back to get_oid() to obtain correct result:
>> [...]
>
> Thanks for clearly laying out your thinking. All of what you wrote makes
> sense to me and I'd be OK with any of the options you described.
>
> The "-g$objectname" one is kind of clever, and definitely not something
> I had thought of. We already have "--long", and of course we'd show the
> long version for any name that isn't an exact match anyway. So as an
> added bonus, it seems unlikely to surprise anybody who is expecting the
> current "show the tag, not the refname" output (though again, this is
> rare enough that I think people simply expect them to be the same ;) ).

There is one thing you may have brought up in the discussion but I
did not touch.  Using v1.0-0-g0123456, based on tagname "v1.0" Bob
gave to it would still describe the right object, but if the user
forced "--no-long", it becomes unclear what we should do.

Saying "v1.0" like the current code does would re-introduce the
"output cannot be fed back to get_oid()" when refs/tags/v1.0 does
not exist, and when it does, get_oid() would yield a wrong object.

Another thing that is not satisfying is what should happen in "all"
mode.  We add "tags/" prefix so in the case we've been discussing,
the output would be "tags/v1.0-0-g0123456", but the whole reason why
we add the prefix is to say that the early part of the name, "v1.0",
is a tag, and it would be natural to associate it with refs/tags/v1.0
that is *not* Bob's tag.

Having said all that, here is what I have at this moment.

-- >8 --
Subject: describe: force long format for a name based on a mislocated tag

An annotated tag has two names: where it sits in the refs/tags
hierarchy and the tagname recorded in the "tag" field in the object
itself.  They usually should match.

Since 212945d4 ("Teach git-describe to verify annotated tag names
before output", 2008-02-28), a commit described using an annotated
tag bases its name on the tagname from the object.  While this was a
deliberate design decision to make it easier to converse about tags
with others, even if the tags happen to be fetched to a different
name than it was given by its creator, it had one downside.

The output from "git describe", at least in the modern Git, should
be usable as an object name to name the exact commit given to the
"git describe" command.  Using the tagname, when two names differ,
breaks this property, when describing a commit that is directly
pointed at by such a tag.  An annotated tag Bob made as "v1.0" may
sit at "refs/tags/v1.0-bob" in the ref hierarchy, and output from
"git describe v1.0-bob^0" would say "v1.0", but there may not be
any tag at "refs/tags/v1.0" locally or there may be another tag that
points at a different object.  

Note that this won't be a problem if a commit being described is not
directly pointed at by such a mislocated tag.  In the example in the
previous paragraph, "git describe v1.0-bob~1" would result in "v1.0"
(i.e. the tagname taken from the tag object) followed by "-1-gXXXXX"
where XXXXX is the abbreviated object name, and a string that ends
with "-g" followed by a hexadecimal string is an object name for the
object whose name begins with hexadecimal string (as long as it is
unique), so it does not matter if the leading part is "v1.0" or
"v1.0-bob".

Show the name in the long format, i.e. with "-0-gXXXXX" suffix, when
the name we give is based on a mislocated annotated tag to ensure
that the output can be used as the object name for the object
originally given to the command to fix the issue.

While at it, remove an overly cautious dead code to protect against
an annotated tag object without the tagname.  Such a tag is filtered
out much earlier in the codeflow, and will not reach this part of
the code.

Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/describe.c  | 13 ++++++++-----
 t/t6120-describe.sh | 10 +++++++++-
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index b6df81d8d0..5e8484f654 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -54,6 +54,7 @@ struct commit_name {
 	struct tag *tag;
 	unsigned prio:2; /* annotated tag = 2, tag = 1, head = 0 */
 	unsigned name_checked:1;
+	unsigned misnamed:1;
 	struct object_id oid;
 	char *path;
 };
@@ -132,6 +133,7 @@ static void add_to_known_names(const char *path,
 		e->tag = tag;
 		e->prio = prio;
 		e->name_checked = 0;
+		e->misnamed = 0;
 		oidcpy(&e->oid, oid);
 		free(e->path);
 		e->path = xstrdup(path);
@@ -275,10 +277,11 @@ static void append_name(struct commit_name *n, struct strbuf *dst)
 			die(_("annotated tag %s not available"), n->path);
 	}
 	if (n->tag && !n->name_checked) {
-		if (!n->tag->tag)
-			die(_("annotated tag %s has no embedded name"), n->path);
-		if (strcmp(n->tag->tag, all ? n->path + 5 : n->path))
-			warning(_("tag '%s' is really '%s' here"), n->tag->tag, n->path);
+		if (strcmp(n->tag->tag, all ? n->path + 5 : n->path)) {
+			warning(_("tag '%s' is externally known as '%s'"),
+				n->path, n->tag->tag);
+			n->misnamed = 1;
+		}
 		n->name_checked = 1;
 	}
 
@@ -314,7 +317,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		 * Exact match to an existing ref.
 		 */
 		append_name(n, dst);
-		if (longformat)
+		if (n->misnamed || longformat)
 			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
 		if (suffix)
 			strbuf_addstr(dst, suffix);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 45047d0a72..16a261c45d 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -130,12 +130,20 @@ test_expect_success 'rename tag A to Q locally' '
 	mv .git/refs/tags/A .git/refs/tags/Q
 '
 cat - >err.expect <<EOF
-warning: tag 'A' is really 'Q' here
+warning: tag 'Q' is externally known as 'A'
 EOF
 check_describe A-* HEAD
 test_expect_success 'warning was displayed for Q' '
 	test_i18ncmp err.expect err.actual
 '
+test_expect_success 'misnamed annotated tag forces long output' '
+	description=$(git describe --no-long Q^0) &&
+	expr "$description" : "A-0-g[0-9a-f]*$" &&
+	git rev-parse --verify "$description" >actual &&
+	git rev-parse --verify Q^0 >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'rename tag Q back to A' '
 	mv .git/refs/tags/Q .git/refs/tags/A
 '
Matheus Tavares Bernardino Feb. 20, 2020, 10:19 p.m. UTC | #13
On Thu, Feb 20, 2020 at 2:34 PM Junio C Hamano <gitster@pobox.com> wrote:
>
[...]
> Another thing that is not satisfying is what should happen in "all"
> mode.  We add "tags/" prefix so in the case we've been discussing,
> the output would be "tags/v1.0-0-g0123456", but the whole reason why
> we add the prefix is to say that the early part of the name, "v1.0",
> is a tag, and it would be natural to associate it with refs/tags/v1.0
> that is *not* Bob's tag.

Yeah, that is not very satisfying, but at least the emitted warning
would make the user think twice before wrongly associating
refs/tags/v1.0 with Bob's tag?

> Having said all that, here is what I have at this moment.
>
> -- >8 --
> Subject: describe: force long format for a name based on a mislocated tag
[...]
>
> The output from "git describe", at least in the modern Git, should
> be usable as an object name to name the exact commit given to the
> "git describe" command.  Using the tagname, when two names differ,
> breaks this property, when describing a commit that is directly
> pointed at by such a tag.  An annotated tag Bob made as "v1.0" may
> sit at "refs/tags/v1.0-bob" in the ref hierarchy, and output from
> "git describe v1.0-bob^0" would say "v1.0", but there may not be
> any tag at "refs/tags/v1.0" locally or there may be another tag that
> points at a different object.
>
> Note that this won't be a problem if a commit being described is not
> directly pointed at by such a mislocated tag.  In the example in the
> previous paragraph, "git describe v1.0-bob~1" would result in "v1.0"
> (i.e. the tagname taken from the tag object) followed by "-1-gXXXXX"

I now this is just an illustration, but shouldn't this example be "git
describe --contains v1.0-bob~1"? Otherwise, I think the tag wouldn't
be found as it comes after the given commit. Testing here without
--contains I get the error "fatal: No tags can describe <sha1 hash>"

However, when using --contains, the output is also not what I
expected. It doesn't fail, but I get "v1.0-bob~1". I.e., it not only
prints the refname instead of the tagname but also don't print any
warnings... This is not what we want, right?

> Show the name in the long format, i.e. with "-0-gXXXXX" suffix, when
> the name we give is based on a mislocated annotated tag to ensure
> that the output can be used as the object name for the object
> originally given to the command to fix the issue.

Another case that came to my mind is when the user runs `git describe
--abrev=0 HEAD` and v1.0-bob points to HEAD~. In this case, v1.0 will
be displayed without suffix, so it is not usable at get_oid(). Should
we, perhaps, also force the long output in this case?

And talking about the forced long output, using `--abbrev=0` in HEAD
(i.e. tag: v1.0-bob), after the patch, will make the complete hash be
displayed. This might be an unnecessary cosmetic nitpick, but we might
want to follow the default behavior when --long is given and use just
as many digits as necessary?
Junio C Hamano Feb. 20, 2020, 10:59 p.m. UTC | #14
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> I now this is just an illustration, but shouldn't this example be "git
> describe --contains v1.0-bob~1"?

No, none of the patches discussed in this thread would not affect
anything in --contains (as it is a completely different program).
The example was bad to use one commit _before_; I meant to use one
commit _after_ the tag.

> Another case that came to my mind is when the user runs `git describe
> --abrev=0 HEAD` and v1.0-bob points to HEAD~. In this case, v1.0 will
> be displayed without suffix,...

In this case, v1.0-1- is followed by the full object name, I think.
Matheus Tavares Bernardino Feb. 21, 2020, 1:33 a.m. UTC | #15
On Thu, Feb 20, 2020 at 7:59 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:
>
> > I now this is just an illustration, but shouldn't this example be "git
> > describe --contains v1.0-bob~1"?
>
> No, none of the patches discussed in this thread would not affect
> anything in --contains (as it is a completely different program).

Ok, thanks for the clarification.

> > Another case that came to my mind is when the user runs `git describe
> > --abrev=0 HEAD` and v1.0-bob points to HEAD~. In this case, v1.0 will
> > be displayed without suffix,...
>
> In this case, v1.0-1- is followed by the full object name, I think.

I might be doing something wrong, but this is how I tried to test this: First I
set the desired scenario with:

$ git tag -am "" v1.0 HEAD~
$ mv .git/refs/tags/v1.0 .git/refs/tags/v1.0-bob

Then, running git-describe on v1.0-bob^0 I got what we want, which is the forced
long output (even with --abbrev=0):

$ git describe --abbrev=0 HEAD~
warning: tag 'v1.0-bob' is externally known as 'v1.0'
v1.0-0-g310a1f27be54ecf16fd36ff987304c1a2f8524b5

But running on the commit after v1.0-bob^0, I got:

$ git describe --abbrev=0 HEAD
warning: tag 'v1.0-bob' is externally known as 'v1.0'
v1.0

Then we cannot directly use the output. I think we can fix this forcing the long
output for this case as well, which can be done with:

diff --git a/builtin/describe.c b/builtin/describe.c
index 5e8484f654..f71bddff4a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -452,7 +452,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 	}
 
 	append_name(all_matches[0].name, dst);
-	if (abbrev)
+	if (all_matches[0].name->misnamed || abbrev)
 		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
 		strbuf_addstr(dst, suffix);
Junio C Hamano Feb. 21, 2020, 2:05 a.m. UTC | #16
Matheus Tavares <matheus.bernardino@usp.br> writes:

>> > Another case that came to my mind is when the user runs `git describe
>> > --abrev=0 HEAD` and v1.0-bob points to HEAD~. In this case, v1.0 will
>> > be displayed without suffix,...
>>
>> In this case, v1.0-1- is followed by the full object name, I think.
>
> I might be doing something wrong, but this is how I tried to test this:...

I only tried the commit exactly at the tag "describe --abbrev=0 v1.0-bob^0";
you're right---the "abbrev" option strips the suffix in a separate codepath.

> Then we cannot directly use the output. I think we can fix this forcing the long
> output for this case as well, which can be done with:

Thanks for a quick fix on top.

 builtin/describe.c  |  2 +-
 t/t6120-describe.sh | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 5e8484f654..f71bddff4a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -452,7 +452,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 	}
 
 	append_name(all_matches[0].name, dst);
-	if (abbrev)
+	if (all_matches[0].name->misnamed || abbrev)
 		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
 		strbuf_addstr(dst, suffix);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 16a261c45d..8f35f18c3f 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -144,6 +144,18 @@ test_expect_success 'misnamed annotated tag forces long output' '
 	test_cmp expect actual
 '
 
+test_expect_success 'abbrev=0 will not break misplaced tag (1)' '
+	description=$(git describe --abbrev=0 Q^0) &&
+	expr "$description" : "A-0-g[0-9a-f]*$"
+'
+
+test_expect_success 'abbrev=0 will not break misplaced tag (2)' '
+	description=$(git describe --abbrev=0 c^0) &&
+	expr "$description" : "A-1-g[0-9a-f]*$"
+'
+
+exit
+
 test_expect_success 'rename tag Q back to A' '
 	mv .git/refs/tags/Q .git/refs/tags/A
 '
Jeff King Feb. 21, 2020, 5:58 a.m. UTC | #17
On Thu, Feb 20, 2020 at 09:34:36AM -0800, Junio C Hamano wrote:

> > The "-g$objectname" one is kind of clever, and definitely not something
> > I had thought of. We already have "--long", and of course we'd show the
> > long version for any name that isn't an exact match anyway. So as an
> > added bonus, it seems unlikely to surprise anybody who is expecting the
> > current "show the tag, not the refname" output (though again, this is
> > rare enough that I think people simply expect them to be the same ;) ).
> 
> There is one thing you may have brought up in the discussion but I
> did not touch.  Using v1.0-0-g0123456, based on tagname "v1.0" Bob
> gave to it would still describe the right object, but if the user
> forced "--no-long", it becomes unclear what we should do.

I think "--no-long" is not "do not ever write a long name". It is
"counteract an earlier request to _always_ print long names". I.e.:

  $ git describe --no-long v2.25.1^
  v2.25.0-99-g6141e0cc00

still produces long output regardless of your patch. And if we continue
to do so in the wrongly-named case (which your patch seems to), that
would be consistent.

> Another thing that is not satisfying is what should happen in "all"
> mode.  We add "tags/" prefix so in the case we've been discussing,
> the output would be "tags/v1.0-0-g0123456", but the whole reason why
> we add the prefix is to say that the early part of the name, "v1.0",
> is a tag, and it would be natural to associate it with refs/tags/v1.0
> that is *not* Bob's tag.

I agree that is not fantastic, but the same problem is there to some
degree even without the "tags/" prefix. The prefix just makes you think
more of the ref namespace. :) I think it's the best we can do, given
that we'll also have just issued a warning.

Speaking of which...

> Having said all that, here is what I have at this moment.
> [...]
> While at it, remove an overly cautious dead code to protect against
> an annotated tag object without the tagname.  Such a tag is filtered
> out much earlier in the codeflow, and will not reach this part of
> the code.

This patch also reverses the order of the warning from "is really" to
"is externally known as", but I didn't see it mentioned in the commit
message.

-Peff
Jeff King Feb. 21, 2020, 6 a.m. UTC | #18
On Thu, Feb 20, 2020 at 06:05:15PM -0800, Junio C Hamano wrote:

> Matheus Tavares <matheus.bernardino@usp.br> writes:
> 
> >> > Another case that came to my mind is when the user runs `git describe
> >> > --abrev=0 HEAD` and v1.0-bob points to HEAD~. In this case, v1.0 will
> >> > be displayed without suffix,...
> >>
> >> In this case, v1.0-1- is followed by the full object name, I think.
> >
> > I might be doing something wrong, but this is how I tried to test this:...
> 
> I only tried the commit exactly at the tag "describe --abbrev=0 v1.0-bob^0";
> you're right---the "abbrev" option strips the suffix in a separate codepath.
> 
> > Then we cannot directly use the output. I think we can fix this forcing the long
> > output for this case as well, which can be done with:
> 
> Thanks for a quick fix on top.

This fix makes sense. There's already a bit of ambiguity with
--no-abbrev, as it would output v1.0-1-g1234abcd as just "v1.0". But it
probably is best to draw the line here and not introduce even more
ambiguity.

-Peff

Patch
diff mbox series

diff --git a/builtin/describe.c b/builtin/describe.c
index b6df81d8d0..6a11604d2e 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -267,7 +267,7 @@  static unsigned long finish_depth_computation(
 	return seen_commits;
 }
 
-static void append_name(struct commit_name *n, struct strbuf *dst)
+static void verify_tag_embedded_name(struct commit_name *n)
 {
 	if (n->prio == 2 && !n->tag) {
 		n->tag = lookup_tag(the_repository, &n->oid);
@@ -276,19 +276,11 @@  static void append_name(struct commit_name *n, struct strbuf *dst)
 	}
 	if (n->tag && !n->name_checked) {
 		if (!n->tag->tag)
-			die(_("annotated tag %s has no embedded name"), n->path);
-		if (strcmp(n->tag->tag, all ? n->path + 5 : n->path))
-			warning(_("tag '%s' is really '%s' here"), n->tag->tag, n->path);
+			warning(_("annotated tag %s has no embedded name"), n->path);
+		else if (strcmp(n->tag->tag, all ? n->path + 5 : n->path))
+			warning(_("tag '%s' is externally known as '%s'"), n->path, n->tag->tag);
 		n->name_checked = 1;
 	}
-
-	if (n->tag) {
-		if (all)
-			strbuf_addstr(dst, "tags/");
-		strbuf_addstr(dst, n->tag->tag);
-	} else {
-		strbuf_addstr(dst, n->path);
-	}
 }
 
 static void append_suffix(int depth, const struct object_id *oid, struct strbuf *dst)
@@ -313,7 +305,8 @@  static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		/*
 		 * Exact match to an existing ref.
 		 */
-		append_name(n, dst);
+		verify_tag_embedded_name(n);
+		strbuf_addstr(dst, n->path);
 		if (longformat)
 			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
 		if (suffix)
@@ -447,8 +440,8 @@  static void describe_commit(struct object_id *oid, struct strbuf *dst)
 				oid_to_hex(&gave_up_on->object.oid));
 		}
 	}
-
-	append_name(all_matches[0].name, dst);
+	verify_tag_embedded_name(all_matches[0].name);
+	strbuf_addstr(dst, all_matches[0].name->path);
 	if (abbrev)
 		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 09c50f3f04..d34c091e0b 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -129,9 +129,9 @@  test_expect_success 'rename tag A to Q locally' '
 	mv .git/refs/tags/A .git/refs/tags/Q
 '
 cat - >err.expect <<EOF
-warning: tag 'A' is really 'Q' here
+warning: tag 'Q' is externally known as 'A'
 EOF
-check_describe A-* HEAD
+check_describe Q-* HEAD
 test_expect_success 'warning was displayed for Q' '
 	test_i18ncmp err.expect err.actual
 '