[1/1] commit-reach: properly peel tags
diff mbox series

Message ID 948e222228d2f2868b85a425142382e63a17773a.1536762173.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • Properly peel tags in can_all_from_reach_with_flags()
Related show

Commit Message

Ben Keene via GitGitGadget Sept. 12, 2018, 2:22 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
"commit-reach: make can_all_from_reach... linear" but incorrectly
assumed that all objects provided were commits. During a fetch
negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
to the 'from' array. The current code creates a segfault.

Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
and add a test in t6600-test-reach.sh that demonstrates this segfault.

Correct the issue by peeling tags when investigating the initial list
of objects in the 'from' array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c        | 25 ++++++++++++++++++-------
 t/helper/test-reach.c | 22 +++++++++++++++++-----
 t/t6600-test-reach.sh | 30 ++++++++++++++++++++++++++++--
 3 files changed, 63 insertions(+), 14 deletions(-)

Comments

Jeff King Sept. 12, 2018, 7:41 p.m. UTC | #1
On Wed, Sep 12, 2018 at 07:22:56AM -0700, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
> "commit-reach: make can_all_from_reach... linear" but incorrectly
> assumed that all objects provided were commits. During a fetch
> negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
> to the 'from' array. The current code creates a segfault.
> 
> Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
> and add a test in t6600-test-reach.sh that demonstrates this segfault.

Thanks, that makes a lot of sense for reproducing. I almost wonder if
the whole X_array of commits in test-reach could just go away, and we'd
feed whatever list of objects the caller happens to send. That may make
it simpler to include non-commit objects in a variety of tests.

That said, I didn't look closely at other fallout in the program from
that, so I'll defer to your judgement.

> Correct the issue by peeling tags when investigating the initial list
> of objects in the 'from' array.
> 
> Signed-off-by: Jeff King <peff@peff.net>

I'm not sure if this should just be Reported-by, since I don't know that
I actually contributed any code. ;) But for anything I might have
contributed, certainly you have my signoff.

>  	for (i = 0; i < from->nr; i++) {
> -		list[i] = (struct commit *)from->objects[i].item;
> +		struct object *from_one = from->objects[i].item;
>  
> -		if (parse_commit(list[i]) ||
> -		    list[i]->generation < min_generation)
> -			return 0;
> +		from_one = deref_tag(the_repository, from_one,
> +				     "a from object", 0);
> +		if (!from_one || from_one->type != OBJ_COMMIT) {
> +			from->objects[i].item->flags |= assign_flag;
> +			continue;
> +		}

I didn't resurrect the comment from this conditional that was in the
original code (mostly because I wasn't sure if the reasoning was still
entirely valid, or what setting the flag here actually means). But it's
probably worth saying something here to explain why it's OK to punt on
this case, and what it means to just set the flag.

> [...]

The rest of the patch looks sane to me. I didn't go through the trouble
to reproduce the segfault with the test, but it sounds like you did.

-Peff
Junio C Hamano Sept. 12, 2018, 9:23 p.m. UTC | #2
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/commit-reach.c b/commit-reach.c
> index 86715c103c..6de72c6e03 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  {
>  	struct commit **list = NULL;
>  	int i;
> +	int nr_commits;
>  	int result = 1;
>  
>  	ALLOC_ARRAY(list, from->nr);
> +	nr_commits = 0;
>  	for (i = 0; i < from->nr; i++) {
> -		list[i] = (struct commit *)from->objects[i].item;
> +		struct object *from_one = from->objects[i].item;
>  
> -		if (parse_commit(list[i]) ||
> -		    list[i]->generation < min_generation)
> -			return 0;
> +		from_one = deref_tag(the_repository, from_one,
> +				     "a from object", 0);
> +		if (!from_one || from_one->type != OBJ_COMMIT) {
> +			from->objects[i].item->flags |= assign_flag;

I wondered why this is not futzing with "from_one->flags"; by going
back to the original from->objects[] array, the code is setting the
flags on the original tag object and not the non-commit object that
was pointed at by the tag.

> +			continue;
> +		}
> +
> +		list[nr_commits] = (struct commit *)from_one;
> +		if (parse_commit(list[nr_commits]) ||
> +		    list[nr_commits]->generation < min_generation)
> +			return 0; /* is this a leak? */
> +		nr_commits++;
>  	}

In the original code, the flags bits were left unchanged if the loop
terminated by hitting a commit whose generation is too young (or
parse_commit() returns non-zero).  With this updated code, flags bit
can be modified before the code notices the situation and leave the
function, bypassing the "cleanup" we see below that clears the
"assign_flag" bits.

Would it be a problem that we return early without cleaning up?

Even if we do not call this early return, the assign_flag bits added
to the original tag in from->objects[i].item won't be cleaned in
this new code, as "cleanup:" section now loops over the list[] that
omits the object whose flags was smudged above before the "continue".

Would it be a problem that we leave the assign_flags without
cleaning up?

> -	QSORT(list, from->nr, compare_commits_by_gen);
> +	QSORT(list, nr_commits, compare_commits_by_gen);
>  
> -	for (i = 0; i < from->nr; i++) {
> +	for (i = 0; i < nr_commits; i++) {
>  		/* DFS from list[i] */
>  		struct commit_list *stack = NULL;
>  
> @@ -600,7 +611,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  	}
>  
>  cleanup:
> -	for (i = 0; i < from->nr; i++) {
> +	for (i = 0; i < nr_commits; i++) {
>  		clear_commit_marks(list[i], RESULT);
>  		clear_commit_marks(list[i], assign_flag);
>  	}
Jeff King Sept. 12, 2018, 9:34 p.m. UTC | #3
On Wed, Sep 12, 2018 at 02:23:42PM -0700, Junio C Hamano wrote:

> > diff --git a/commit-reach.c b/commit-reach.c
> > index 86715c103c..6de72c6e03 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array *from,
> >  {
> >  	struct commit **list = NULL;
> >  	int i;
> > +	int nr_commits;
> >  	int result = 1;
> >  
> >  	ALLOC_ARRAY(list, from->nr);
> > +	nr_commits = 0;
> >  	for (i = 0; i < from->nr; i++) {
> > -		list[i] = (struct commit *)from->objects[i].item;
> > +		struct object *from_one = from->objects[i].item;
> >  
> > -		if (parse_commit(list[i]) ||
> > -		    list[i]->generation < min_generation)
> > -			return 0;
> > +		from_one = deref_tag(the_repository, from_one,
> > +				     "a from object", 0);
> > +		if (!from_one || from_one->type != OBJ_COMMIT) {
> > +			from->objects[i].item->flags |= assign_flag;
> 
> I wondered why this is not futzing with "from_one->flags"; by going
> back to the original from->objects[] array, the code is setting the
> flags on the original tag object and not the non-commit object that
> was pointed at by the tag.

Note that from_one may even be NULL.

> > +			continue;
> > +		}
> > +
> > +		list[nr_commits] = (struct commit *)from_one;
> > +		if (parse_commit(list[nr_commits]) ||
> > +		    list[nr_commits]->generation < min_generation)
> > +			return 0; /* is this a leak? */
> > +		nr_commits++;
> >  	}
> 
> In the original code, the flags bits were left unchanged if the loop
> terminated by hitting a commit whose generation is too young (or
> parse_commit() returns non-zero).  With this updated code, flags bit
> can be modified before the code notices the situation and leave the
> function, bypassing the "cleanup" we see below that clears the
> "assign_flag" bits.
> 
> Would it be a problem that we return early without cleaning up?
> 
> Even if we do not call this early return, the assign_flag bits added
> to the original tag in from->objects[i].item won't be cleaned in
> this new code, as "cleanup:" section now loops over the list[] that
> omits the object whose flags was smudged above before the "continue".
> 
> Would it be a problem that we leave the assign_flags without
> cleaning up?

Yeah, I hadn't thought about the bit cleanup when making my original
suggestion. In the original code (before 4fbcca4eff), I think we did set
flags as we iterated through the loop, and we could still do an early
return when we hit "!reachable(...)". But I don't see any cleanup of
assign_flag there at all.

So I guess I'm pretty confused about what the semantics are supposed to
be.

-Peff

Patch
diff mbox series

diff --git a/commit-reach.c b/commit-reach.c
index 86715c103c..6de72c6e03 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -544,20 +544,31 @@  int can_all_from_reach_with_flag(struct object_array *from,
 {
 	struct commit **list = NULL;
 	int i;
+	int nr_commits;
 	int result = 1;
 
 	ALLOC_ARRAY(list, from->nr);
+	nr_commits = 0;
 	for (i = 0; i < from->nr; i++) {
-		list[i] = (struct commit *)from->objects[i].item;
+		struct object *from_one = from->objects[i].item;
 
-		if (parse_commit(list[i]) ||
-		    list[i]->generation < min_generation)
-			return 0;
+		from_one = deref_tag(the_repository, from_one,
+				     "a from object", 0);
+		if (!from_one || from_one->type != OBJ_COMMIT) {
+			from->objects[i].item->flags |= assign_flag;
+			continue;
+		}
+
+		list[nr_commits] = (struct commit *)from_one;
+		if (parse_commit(list[nr_commits]) ||
+		    list[nr_commits]->generation < min_generation)
+			return 0; /* is this a leak? */
+		nr_commits++;
 	}
 
-	QSORT(list, from->nr, compare_commits_by_gen);
+	QSORT(list, nr_commits, compare_commits_by_gen);
 
-	for (i = 0; i < from->nr; i++) {
+	for (i = 0; i < nr_commits; i++) {
 		/* DFS from list[i] */
 		struct commit_list *stack = NULL;
 
@@ -600,7 +611,7 @@  int can_all_from_reach_with_flag(struct object_array *from,
 	}
 
 cleanup:
-	for (i = 0; i < from->nr; i++) {
+	for (i = 0; i < nr_commits; i++) {
 		clear_commit_marks(list[i], RESULT);
 		clear_commit_marks(list[i], assign_flag);
 	}
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index eb21103998..08d2ea68e8 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -31,6 +31,7 @@  int cmd__reach(int ac, const char **av)
 	struct object_id oid_A, oid_B;
 	struct commit *A, *B;
 	struct commit_list *X, *Y;
+	struct object_array X_obj = OBJECT_ARRAY_INIT;
 	struct commit **X_array;
 	int X_nr, X_alloc;
 	struct strbuf buf = STRBUF_INIT;
@@ -49,7 +50,8 @@  int cmd__reach(int ac, const char **av)
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		struct object_id oid;
-		struct object *o;
+		struct object *orig;
+		struct object *peeled;
 		struct commit *c;
 		if (buf.len < 3)
 			continue;
@@ -57,14 +59,14 @@  int cmd__reach(int ac, const char **av)
 		if (get_oid_committish(buf.buf + 2, &oid))
 			die("failed to resolve %s", buf.buf + 2);
 
-		o = parse_object(r, &oid);
-		o = deref_tag_noverify(o);
+		orig = parse_object(r, &oid);
+		peeled = deref_tag_noverify(orig);
 
-		if (!o)
+		if (!peeled)
 			die("failed to load commit for input %s resulting in oid %s\n",
 			    buf.buf, oid_to_hex(&oid));
 
-		c = object_as_type(r, o, OBJ_COMMIT, 0);
+		c = object_as_type(r, peeled, OBJ_COMMIT, 0);
 
 		if (!c)
 			die("failed to load commit for input %s resulting in oid %s\n",
@@ -85,6 +87,7 @@  int cmd__reach(int ac, const char **av)
 				commit_list_insert(c, &X);
 				ALLOC_GROW(X_array, X_nr + 1, X_alloc);
 				X_array[X_nr++] = c;
+				add_object_array(orig, NULL, &X_obj);
 				break;
 
 			case 'Y':
@@ -113,6 +116,15 @@  int cmd__reach(int ac, const char **av)
 		print_sorted_commit_ids(list);
 	} else if (!strcmp(av[1], "can_all_from_reach")) {
 		printf("%s(X,Y):%d\n", av[1], can_all_from_reach(X, Y, 1));
+	} else if (!strcmp(av[1], "can_all_from_reach_with_flag")) {
+		struct commit_list *iter = Y;
+
+		while (iter) {
+			iter->item->object.flags |= 2;
+			iter = iter->next;
+		}
+
+		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
 		struct ref_filter filter;
 		struct contains_cache cache;
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index d139a00d1d..f7bf82290b 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -31,7 +31,8 @@  test_expect_success 'setup' '
 	for i in $(test_seq 1 10)
 	do
 		test_commit "1-$i" &&
-		git branch -f commit-1-$i
+		git branch -f commit-1-$i &&
+		git tag -a -m "1-$i" tag-1-$i commit-1-$i
 	done &&
 	for j in $(test_seq 1 9)
 	do
@@ -39,11 +40,13 @@  test_expect_success 'setup' '
 		x=$(($j + 1)) &&
 		test_commit "$x-1" &&
 		git branch -f commit-$x-1 &&
+		git tag -a -m "$x-1" tag-$x-1 commit-$x-1 &&
 
 		for i in $(test_seq 2 10)
 		do
 			git merge commit-$j-$i -m "$x-$i" &&
-			git branch -f commit-$x-$i
+			git branch -f commit-$x-$i &&
+			git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i
 		done
 	done &&
 	git commit-graph write --reachable &&
@@ -205,6 +208,29 @@  test_expect_success 'can_all_from_reach:miss' '
 	test_three_modes can_all_from_reach
 '
 
+test_expect_success 'can_all_from_reach_with_flag: tags case' '
+	cat >input <<-\EOF &&
+	X:tag-2-10
+	X:tag-3-9
+	X:tag-4-8
+	X:commit-5-7
+	X:commit-6-6
+	X:commit-7-5
+	X:commit-8-4
+	X:commit-9-3
+	Y:tag-1-9
+	Y:tag-2-8
+	Y:tag-3-7
+	Y:commit-4-6
+	Y:commit-5-5
+	Y:commit-6-4
+	Y:commit-7-3
+	Y:commit-8-1
+	EOF
+	echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect &&
+	test_three_modes can_all_from_reach_with_flag
+'
+	
 test_expect_success 'commit_contains:hit' '
 	cat >input <<-\EOF &&
 	A:commit-7-7