Message ID | ZNffWAgldUZdpQcr@farprobe (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [BUG] `git describe` doesn't traverse the graph in topological order | expand |
On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: > I found an issue where `git describe` doesn't find a "closer" tag than > another tag as the correct one to base the description off of. I have a > reproducer, but I'll first give details of the real world issue. Bump. Can anyone provide guidance as to what the best solution to this might be? Thanks, --Ben
On Friday, September 22, 2023 11:40 AM, Ben Boeckel wrote: >On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: >> I found an issue where `git describe` doesn't find a "closer" tag than >> another tag as the correct one to base the description off of. I have >> a reproducer, but I'll first give details of the real world issue. > >Bump. Can anyone provide guidance as to what the best solution to this might be? Can you provide details? `git describe` is sensitive to --first-parent and whether the tag has annotations. --Randall
On Fri, Sep 22, 2023 at 12:13:00 -0400, rsbecker@nexbridge.com wrote: > On Friday, September 22, 2023 11:40 AM, Ben Boeckel wrote: > >On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: > >> I found an issue where `git describe` doesn't find a "closer" tag than > >> another tag as the correct one to base the description off of. I have > >> a reproducer, but I'll first give details of the real world issue. > > > >Bump. Can anyone provide guidance as to what the best solution to this might be? > > Can you provide details? `git describe` is sensitive to --first-parent > and whether the tag has annotations. I provided more details and a reproducer in the original email: https://lore.kernel.org/git/ZNffWAgldUZdpQcr@farprobe/T/#u Thanks, --Ben
On Fri, Sep 22, 2023, at 18:13, rsbecker@nexbridge.com wrote: > On Friday, September 22, 2023 11:40 AM, Ben Boeckel wrote: >>On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: >>> I found an issue where `git describe` doesn't find a "closer" tag than >>> another tag as the correct one to base the description off of. I have >>> a reproducer, but I'll first give details of the real world issue. >> >>Bump. Can anyone provide guidance as to what the best solution to this might be? > > Can you provide details? `git describe` is sensitive to --first-parent > and whether the tag has annotations. > --Randall Both of the tags (`v9.3.0.rc0` and `v9.3.0.rc1`) are annotated ones.
On Friday, September 22, 2023 12:51 PM, Ben Boeckel wrote: >On Fri, Sep 22, 2023 at 12:13:00 -0400, rsbecker@nexbridge.com wrote: >> On Friday, September 22, 2023 11:40 AM, Ben Boeckel wrote: >> >On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: >> >> I found an issue where `git describe` doesn't find a "closer" tag >> >> than another tag as the correct one to base the description off of. >> >> I have a reproducer, but I'll first give details of the real world issue. >> > >> >Bump. Can anyone provide guidance as to what the best solution to this might be? >> >> Can you provide details? `git describe` is sensitive to --first-parent >> and whether the tag has annotations. > >I provided more details and a reproducer in the original email: > > https://lore.kernel.org/git/ZNffWAgldUZdpQcr@farprobe/T/#u As I indicated, the command is sensitive to --first-parent. For example: $ git describe v9.3.0.rc0-520-g1339e86833 $ git describe --first-parent v9.0.0.rc1-5143-g1339e86833 You have multiple parents in your tree of HEAD. This is probably confusing the interpretation. The most closely connected tag to HEAD is v9.3.0.rc0, from what I can read from your tree. Dates and times of the commit do not participate in this determination, to my knowledge. You can force selection of a subset of tags by specifying the --match=pattern argument. There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 closer to HEAD than v9.3.0.rc1.
Looks related: Link: https://public-inbox.org/git/CABPp-BH2zuYe87xhjdp5v7M7i+EfEgLHAZgwfzJUAxGk1CFgfA@mail.gmail.com/ Message-ID: CABPp-BH2zuYe87xhjdp5v7M7i+EfEgLHAZgwfzJUAxGk1CFgfA@mail.gmail.com Via: https://stackoverflow.com/questions/72886894/git-describe-is-not-returning-the-expected-tag
On Fri, Sep 22, 2023 at 13:14:30 -0400, rsbecker@nexbridge.com wrote: > On Friday, September 22, 2023 12:51 PM, Ben Boeckel wrote: > >On Fri, Sep 22, 2023 at 12:13:00 -0400, rsbecker@nexbridge.com wrote: > >> On Friday, September 22, 2023 11:40 AM, Ben Boeckel wrote: > >> >On Sat, Aug 12, 2023 at 15:36:56 -0400, Ben Boeckel wrote: > >> >> I found an issue where `git describe` doesn't find a "closer" tag > >> >> than another tag as the correct one to base the description off of. > >> >> I have a reproducer, but I'll first give details of the real world issue. > >> > > >> >Bump. Can anyone provide guidance as to what the best solution to this might be? > >> > >> Can you provide details? `git describe` is sensitive to --first-parent > >> and whether the tag has annotations. > > > >I provided more details and a reproducer in the original email: > > > > https://lore.kernel.org/git/ZNffWAgldUZdpQcr@farprobe/T/#u > > As I indicated, the command is sensitive to --first-parent. For example: > > $ git describe > v9.3.0.rc0-520-g1339e86833 > $ git describe --first-parent > v9.0.0.rc1-5143-g1339e86833 Sorry, but this is just even more confusing to me as neither tag is on the first-parent history of `HEAD`. > You have multiple parents in your tree of HEAD. This is probably > confusing the interpretation. The most closely connected tag to HEAD > is v9.3.0.rc0, from what I can read from your tree. Dates and times of > the commit do not participate in this determination, to my knowledge. > You can force selection of a subset of tags by specifying the > --match=pattern argument. I don't see how that is possible since v9.3.0.rc0 is v9.3.0.rc1~2. Note the "not on the tag" commit count for the descriptions being wildly different. > There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 > closer to HEAD than v9.3.0.rc1. That is still giving an incorrect description as there are *fewer* commits not on rc1 than rc0 relative to HEAD (as rc0 is an ancestor of rc1). Thanks, --Ben
On Fri, Sep 22, 2023 at 19:35:01 +0200, Kristoffer Haugsbakk wrote: > Looks related: > > Link: https://public-inbox.org/git/CABPp-BH2zuYe87xhjdp5v7M7i+EfEgLHAZgwfzJUAxGk1CFgfA@mail.gmail.com/ > Message-ID: CABPp-BH2zuYe87xhjdp5v7M7i+EfEgLHAZgwfzJUAxGk1CFgfA@mail.gmail.com > Via: https://stackoverflow.com/questions/72886894/git-describe-is-not-returning-the-expected-tag Thanks. It seems that these discussions previously determined the same (painful) pill: SZEDER Gábor at https://lore.kernel.org/git/20191008123156.GG11529@szeder.dev/: I think the proper way to fix this issue would be to make 'git describe' traverse the history in topographical order. Alas, I'm afraid this would result in a noticable performance penalty on big histories without a commit graph. The `sleep 1` is probably the remedy we'll use if time to actually fix this doesn't come up (or the "proper" fix is deemed as "too expensive"). Thanks for the links, --Ben
<rsbecker@nexbridge.com> writes:
> There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 closer to HEAD than v9.3.0.rc1.
I didn't look at the actual graph but let me say I trust you ;-)
I wonder if there should be an obvious "explain why you gave this
name" mode added to the command, though. The command should be able
to say "The closest path from HEAD to any tag is via this, that, and
that commit, which is N hops to tag T0", and from there, the user
should be able to say "Oh, I thought T1 was closer, let me try again
to describe HEAD, limiting the candidate only to T1" and run the
command in that mode, which should be able to say "The closest path
from HEAD to any tag that is allowed as a candidate is via these
commits, which is M hops to tag T1". And if M is smaller than N,
then that may deserve to trigger a bug report (but as you said,
there are rules like preferring annotated over unannotated tags
involved, so it may not as straight-forward as comparing the two
integer hop counts).
Thanks for digging.
On Friday, September 22, 2023 1:52 PM, Junio C Hamano wrote: ><rsbecker@nexbridge.com> writes: > >> There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 closer to HEAD >than v9.3.0.rc1. > >I didn't look at the actual graph but let me say I trust you ;-) > >I wonder if there should be an obvious "explain why you gave this name" mode added >to the command, though. The command should be able to say "The closest path from >HEAD to any tag is via this, that, and that commit, which is N hops to tag T0", and >from there, the user should be able to say "Oh, I thought T1 was closer, let me try >again to describe HEAD, limiting the candidate only to T1" and run the command in >that mode, which should be able to say "The closest path from HEAD to any tag that >is allowed as a candidate is via these commits, which is M hops to tag T1". And if M >is smaller than N, then that may deserve to trigger a bug report (but as you said, >there are rules like preferring annotated over unannotated tags involved, so it may >not as straight-forward as comparing the two integer hop counts). > >Thanks for digging. I'm wondering whether we need something more general that --first-parent. Perhaps something like git describe commitish [ commitish ... ] Where the traversal must cross the set of specified commitish points in history in order to find the expected tag. In Ben's case, I do not think that would help much, given the complexity of his history. Perhaps a --verbose argument might display the analysis path done by git describe as above. Sadly, I am not familiar with this code area. What confuses me is how, in the other subthread, that adding sleep 1 to the construction of history should make any difference. My understanding is that the path to the tag is invariant of the commit-date.
On Fri, Sep 22, 2023 at 10:51:59 -0700, Junio C Hamano wrote: > <rsbecker@nexbridge.com> writes: > > > There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 closer to HEAD than v9.3.0.rc1. > > I didn't look at the actual graph but let me say I trust you ;-) > > I wonder if there should be an obvious "explain why you gave this > name" mode added to the command, though. The command should be able > to say "The closest path from HEAD to any tag is via this, that, and > that commit, which is N hops to tag T0", and from there, the user > should be able to say "Oh, I thought T1 was closer, let me try again > to describe HEAD, limiting the candidate only to T1" and run the > command in that mode, which should be able to say "The closest path > from HEAD to any tag that is allowed as a candidate is via these > commits, which is M hops to tag T1". And if M is smaller than N, > then that may deserve to trigger a bug report (but as you said, > there are rules like preferring annotated over unannotated tags > involved, so it may not as straight-forward as comparing the two > integer hop counts). The thing is that the count is what is wrong here, so the determination of what is "closer" is wrong. Any explanation would say things like "commit X~10 is not part of X". --Ben
On Fri, Sep 22, 2023 at 14:12:31 -0400, rsbecker@nexbridge.com wrote: > What confuses me is how, in the other subthread, that adding sleep 1 to the > construction of history should make any difference. My understanding is that > the path to the tag is invariant of the commit-date. Yes. It is explained that the commit date stored is only to 1 second granularity. Since the commits are stored in commit-date, an equal commit date ends up "twisting" the history and traversing some ancestors of commits before the commits themsevles. This loses the "seen" bit tracking that is done and ends up labeling way more commits as "not part of" ancestors. By sleeping for a second, the commit dates can be totally ordered reliably. And this tracks with my and the other thread's result that the traversal is not paying attention to the topological history properly. --Ben
On Friday, September 22, 2023 2:44 PM, Ben Boeckel wrote: >On Fri, Sep 22, 2023 at 14:12:31 -0400, rsbecker@nexbridge.com wrote: >> What confuses me is how, in the other subthread, that adding sleep 1 >> to the construction of history should make any difference. My >> understanding is that the path to the tag is invariant of the commit-date. > >Yes. It is explained that the commit date stored is only to 1 second granularity. Since >the commits are stored in commit-date, an equal commit date ends up "twisting" the >history and traversing some ancestors of commits before the commits themsevles. >This loses the "seen" bit tracking that is done and ends up labeling way more >commits as "not part of" ancestors. By sleeping for a second, the commit dates can >be totally ordered reliably. This is going to be awkward to resolve as time_t only resolves (portably) to 1 second intervals. I still would prefer the resolution to be path-based rather than time-based.
On Fri, Sep 22, 2023 at 14:49:58 -0400, rsbecker@nexbridge.com wrote: > On Friday, September 22, 2023 2:44 PM, Ben Boeckel wrote: > >Yes. It is explained that the commit date stored is only to 1 second granularity. Since > >the commits are stored in commit-date, an equal commit date ends up "twisting" the > >history and traversing some ancestors of commits before the commits themsevles. > >This loses the "seen" bit tracking that is done and ends up labeling way more > >commits as "not part of" ancestors. By sleeping for a second, the commit dates can > >be totally ordered reliably. > > This is going to be awkward to resolve as time_t only resolves > (portably) to 1 second intervals. I still would prefer the resolution > to be path-based rather than time-based. I certainly agree, but I'm not sure of the best way of doing that. Do we create/load a commit graph and use that for resolving insertion order into the commit heap? --Ben
On Friday, September 22, 2023 3:06 PM, Ben Boeckel wrote: >On Fri, Sep 22, 2023 at 14:49:58 -0400, rsbecker@nexbridge.com wrote: >> On Friday, September 22, 2023 2:44 PM, Ben Boeckel wrote: >> >Yes. It is explained that the commit date stored is only to 1 second >> >granularity. Since the commits are stored in commit-date, an equal >> >commit date ends up "twisting" the history and traversing some ancestors of >commits before the commits themsevles. >> >This loses the "seen" bit tracking that is done and ends up labeling >> >way more commits as "not part of" ancestors. By sleeping for a >> >second, the commit dates can be totally ordered reliably. >> >> This is going to be awkward to resolve as time_t only resolves >> (portably) to 1 second intervals. I still would prefer the resolution >> to be path-based rather than time-based. > >I certainly agree, but I'm not sure of the best way of doing that. Do we create/load a >commit graph and use that for resolving insertion order into the commit heap? I actually thought it worked that way. This may end up in a bigger change than fixing the issue because --first-parent does not appear to be sufficient to resolve the correct tag from your graph. My thought on using multiple commitish values to do that may help, but implementing that could lead to an O(n*m) scan (n=max commit tree width, m=depth to tag), plus a commitish hash lookup.
On Fri, Sep 22, 2023 at 13:14:30 -0400, rsbecker@nexbridge.com wrote: > There appears to be a merge at 446120fd88 which brings v9.3.0.rc0 > closer to HEAD than v9.3.0.rc1. I'll also note that `.rc0` was added as a fix for the situation of `.rc1` not being found properly. Without that, it finds `v9.2.6` as the "closest" tag. --Ben
diff --git a/builtin/describe.c b/builtin/describe.c index b28a4a1f82..5895d1af3a 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -264,8 +264,10 @@ static unsigned long finish_depth_computation( } if (!a) break; - } else + } else { best->depth++; + fprintf(stderr, "pushing depth of %s (finish_depth_computation): %d\n", oid_to_hex(&c->object.oid), best->depth); + } while (parents) { struct commit *p = parents->item; repo_parse_commit(the_repository, p); @@ -363,19 +365,24 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) struct commit_list *parents = c->parents; struct commit_name **slot; + fprintf(stderr, "\n\nlooking at commit %s\n", oid_to_hex(&c->object.oid)); seen_commits++; slot = commit_names_peek(&commit_names, c); n = slot ? *slot : NULL; if (n) { if (!tags && !all && n->prio < 2) { + fprintf(stderr, "skipping unannotated tag %s\n", oid_to_hex(&c->object.oid)); unannotated_cnt++; } else if (match_cnt < max_candidates) { struct possible_tag *t = &all_matches[match_cnt++]; t->name = n; t->depth = seen_commits - 1; + fprintf(stderr, "depth of %s: %d\n", oid_to_hex(&c->object.oid), t->depth); t->flag_within = 1u << match_cnt; t->found_order = match_cnt; + fprintf(stderr, "find order of %s: %d\n", oid_to_hex(&c->object.oid), t->found_order); c->object.flags |= t->flag_within; + fprintf(stderr, "setting flag %x for commit %s\n", t->flag_within, oid_to_hex(&c->object.oid)); if (n->prio == 2) annotated_cnt++; } @@ -386,11 +393,15 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) } for (cur_match = 0; cur_match < match_cnt; cur_match++) { struct possible_tag *t = &all_matches[cur_match]; - if (!(c->object.flags & t->flag_within)) + if (!(c->object.flags & t->flag_within)) { t->depth++; + fprintf(stderr, "flag for %s: %x\n", oid_to_hex(&c->object.oid), c->object.flags); + fprintf(stderr, "pushing depth of %s because of %s (flag_within): %d\n", oid_to_hex(&t->name->peeled), oid_to_hex(&c->object.oid), t->depth); + } } /* Stop if last remaining path already covered by best candidate(s) */ if (annotated_cnt && !list) { + fprintf(stderr, "checking for best candidate\n"); int best_depth = INT_MAX; unsigned best_within = 0; for (cur_match = 0; cur_match < match_cnt; cur_match++) { @@ -415,6 +426,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) if (!(p->object.flags & SEEN)) commit_list_insert_by_date(p, &list); p->object.flags |= c->object.flags; + fprintf(stderr, "setting flag %x for commit %s due to ancestry\n", p->object.flags, oid_to_hex(&p->object.oid)); parents = parents->next; if (first_parent)