diff mbox series

describe: dont abort too early when searching tags

Message ID 20200223125102.6697-1-benno@bmevers.de (mailing list archive)
State New, archived
Headers show
Series describe: dont abort too early when searching tags | expand

Commit Message

Benno Evers Feb. 23, 2020, 12:51 p.m. UTC
When searching the commit graph for tag candidates, `git-describe`
will stop as soon as there is only one active branch left and
it already found an annotated tag as a candidate.

This works well as long as all branches eventually connect back
to a common root, but if the tags are found across branches
with no common ancestor

                  B
                  o----.
                        \
          o-----o---o----x
          A

it can happen that the search on one branch terminates prematurely
because a tag was found on another, independent branch. This scenario
isn't quite as obscure as it sounds, since cloning with a limited
depth often introduces many independent "dead ends" into the commit
graph.

The help text of `git-describe` states pretty clearly that when
describing a commit D, the number appended to the emitted tag X should
correspond to the number of commits found by `git log X..D`.

Thus, this commit modifies the stopping condition to only abort
the search when only one branch is left to search *and* all current
best candidates are descendants from that branch.

Signed-off-by: Benno Evers <benno@bmevers.de>
---
We originally found this issue in one of our internal CI jobs,
which relied on `git-describe` to assign filenames to the generated
artifacts.
---
 builtin/describe.c  | 22 +++++++++++++++----
 t/t6120-describe.sh | 53 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Feb. 24, 2020, 8:52 p.m. UTC | #1
Benno Evers <benno@bmevers.de> writes:

> When searching the commit graph for tag candidates, `git-describe`
> will stop as soon as there is only one active branch left and
> it already found an annotated tag as a candidate.
>
> This works well as long as all branches eventually connect back
> to a common root, but if the tags are found across branches
> with no common ancestor
>
>                   B
>                   o----.
>                         \
>           o-----o---o----x
>           A
>
> it can happen that the search on one branch terminates prematurely
> because a tag was found on another, independent branch. This scenario
> isn't quite as obscure as it sounds, since cloning with a limited
> depth often introduces many independent "dead ends" into the commit
> graph.
>
> The help text of `git-describe` states pretty clearly that when
> describing a commit D, the number appended to the emitted tag X should
> correspond to the number of commits found by `git log X..D`.

To be fair, that description was written for the case in a normal
history with only a single root.

How much, if any, does this change hurt the performance by forcing
the code to dig further if there aren't multiple roots?  If there is
such an unnecessary overhead that degrades performance for the more
common case, can we improve it further to avoid it?

Thanks.
Benno Evers Feb. 25, 2020, 7:07 p.m. UTC | #2
From: Benno Evers <benno@bmevers.de>

> How much, if any, does this change hurt the performance by forcing
> the code to dig further if there aren't multiple roots?  If there is
> such an unnecessary overhead that degrades performance for the more
> common case, can we improve it further to avoid it?

If there aren't multiple roots, then this should be visiting exactly
the same number of commits as before. This is because in this case, if
we're down to a single branch, the current commit must be an ancestor
of *all* tag candidates so the stopping condition is always true.

It's actually a bit challenging to find good test repositories for
this in the wild, as the big ones that use a merging workflow (like git
itself, the kernel, etc.) usually have so many branches active at any
point in time that the search will only stop when it hits the max number
of candidates. So I might be missing some edge cases. However, for the
"normal" repositories that I tested the number of traversed commits was
always the same before and after the change.
Junio C Hamano Feb. 25, 2020, 8:10 p.m. UTC | #3
benno@bmevers.de writes:

> From: Benno Evers <benno@bmevers.de>
>
>> How much, if any, does this change hurt the performance by forcing
>> the code to dig further if there aren't multiple roots?  If there is
>> such an unnecessary overhead that degrades performance for the more
>> common case, can we improve it further to avoid it?
>
> If there aren't multiple roots, then this should be visiting exactly
> the same number of commits as before. This is because in this case, if
> we're down to a single branch, the current commit must be an ancestor
> of *all* tag candidates so the stopping condition is always true.

Sounds good.  Can some of that analysis go in the proposed log
message text, so that other people do not have to ask the same
question later?

Thanks.
diff mbox series

Patch

diff --git a/builtin/describe.c b/builtin/describe.c
index b6df81d8d0..420f4c6401 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -376,11 +376,25 @@  static void describe_commit(struct object_id *oid, struct strbuf *dst)
 			if (!(c->object.flags & t->flag_within))
 				t->depth++;
 		}
+		/* Stop if last remaining path already covered by best candidate(s) */
 		if (annotated_cnt && !list) {
-			if (debug)
-				fprintf(stderr, _("finished search at %s\n"),
-					oid_to_hex(&c->object.oid));
-			break;
+			int best_depth = INT_MAX;
+			unsigned best_within = 0;
+			for (cur_match = 0; cur_match < match_cnt; cur_match++) {
+				struct possible_tag *t = &all_matches[cur_match];
+				if (t->depth < best_depth) {
+					best_depth = t->depth;
+					best_within = t->flag_within;
+				} else if (t->depth == best_depth) {
+					best_within |= t->flag_within;
+				}
+			}
+			if ((c->object.flags & best_within) == best_within) {
+				if (debug)
+					fprintf(stderr, _("finished search at %s\n"),
+						oid_to_hex(&c->object.oid));
+				break;
+			}
 		}
 		while (parents) {
 			struct commit *p = parents->item;
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 09c50f3f04..d8cc08258e 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -479,4 +479,55 @@  test_expect_success 'name-rev covers all conditions while looking at parents' '
 	)
 '
 
-test_done
+#               B
+#               o
+#                \
+#  o-----o---o----x
+#        A
+#
+test_expect_success 'describe commits with disjoint bases' '
+	git init disjoint1 &&
+	(
+		cd disjoint1 &&
+
+		echo o >> file && git add file && git commit -m o &&
+		echo A >> file && git add file && git commit -m A &&
+		git tag A -a -m A &&
+		echo o >> file && git add file && git commit -m o &&
+
+		git checkout --orphan branch && rm file &&
+		echo B > file2 && git add file2 && git commit -m B &&
+		git tag B -a -m B &&
+		git merge --no-ff --allow-unrelated-histories master -m x &&
+
+		check_describe "A-3-*" HEAD
+	)
+'
+
+#           B
+#   o---o---o------------.
+#                         \
+#                  o---o---x
+#                  A
+#
+test_expect_success 'describe commits with disjoint bases 2' '
+	git init disjoint2 &&
+	(
+		cd disjoint2 &&
+
+		echo A >> file && git add file && GIT_COMMITTER_DATE="2020-01-01 18:00" git commit -m A &&
+		git tag A -a -m A &&
+		echo o >> file && git add file && GIT_COMMITTER_DATE="2020-01-01 18:01" git commit -m o &&
+
+		git checkout --orphan branch &&
+		echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:00" git commit -m o &&
+		echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:01" git commit -m o &&
+		echo B >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:02" git commit -m B &&
+		git tag B -a -m B &&
+		git merge --no-ff --allow-unrelated-histories master -m x &&
+
+		check_describe "B-3-*" HEAD
+	)
+'
+
+test_done
\ No newline at end of file