[v2,07/15] rev-list: allow bitmaps when counting objects
diff mbox series

Message ID 20200214182222.GG150965@coredump.intra.peff.net
State New
Headers show
Series
  • combining object filters and bitmaps
Related show

Commit Message

Jeff King Feb. 14, 2020, 6:22 p.m. UTC
The prior commit taught "--count --objects" to work without bitmaps. We
should be able to get the same answer much more quickly with bitmaps.

Note that we punt on the max_count case here. This perhaps _could_ be
made to work if we find all of the boundary commits and treat them as
UNINTERESTING, subtracting them (and their reachable objects) from the
set we return. That implies an actual commit traversal, but we'd still
be faster due to avoiding opening up any trees. Given the complexity and
the fact that anyone is unlikely to want this, it makes sense to just
fall back to the non-bitmap case for now.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c      | 21 ++++++++++++++++++---
 t/t5310-pack-bitmaps.sh |  6 ++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

Comments

Taylor Blau Feb. 15, 2020, 12:45 a.m. UTC | #1
On Fri, Feb 14, 2020 at 01:22:22PM -0500, Jeff King wrote:
> The prior commit taught "--count --objects" to work without bitmaps. We
> should be able to get the same answer much more quickly with bitmaps.
>
> Note that we punt on the max_count case here. This perhaps _could_ be
> made to work if we find all of the boundary commits and treat them as
> UNINTERESTING, subtracting them (and their reachable objects) from the
> set we return. That implies an actual commit traversal, but we'd still
> be faster due to avoiding opening up any trees. Given the complexity and
> the fact that anyone is unlikely to want this, it makes sense to just
> fall back to the non-bitmap case for now.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/rev-list.c      | 21 ++++++++++++++++++---
>  t/t5310-pack-bitmaps.sh |  6 ++++++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 9452123988..70f3207ecc 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value)
>
>  static int try_bitmap_count(struct rev_info *revs)
>  {
> -	uint32_t commit_count;
> +	uint32_t commit_count = 0,
> +		 tag_count = 0,
> +		 tree_count = 0,
> +		 blob_count = 0;

Hmm, I don't usually see the comma-separated declaration/initialization
in git.git. Is there a reason you did it here? Not that I really mind
one way or the other, just interested.

>  	int max_count;
>  	struct bitmap_index *bitmap_git;
>
> @@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs)
>  	if (revs->left_right || revs->cherry_mark)
>  		return -1;
>
> +	/*
> +	 * If we're counting reachable objects, we can't handle a max count of
> +	 * commits to traverse, since we don't know which objects go with which
> +	 * commit.
> +	 */
> +	if (revs->max_count >= 0 &&
> +	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))

An aside unrelated to the patch at hand: the expression

  (revs->tag_objects || revs->tree_objects || revs->blob_objects)

does occur in an awful lot of places throughout this file. Do you
imagine it'd be useful to pull this check out into its own function,
perhaps as a preparatory patch in a later version of this series?

I'm also not fussed if you don't think that such a change would be
useful, it's just an observation I had after seeing this expression a
few times.

> +		return -1;
> +
>  	/*
>  	 * This must be saved before doing any walking, since the revision
>  	 * machinery will count it down to zero while traversing.
> @@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs)
>  	if (!bitmap_git)
>  		return -1;
>
> -	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> +	count_bitmap_commit_list(bitmap_git, &commit_count,
> +				 revs->tree_objects ? &tree_count : NULL,
> +				 revs->blob_objects ? &blob_count : NULL,
> +				 revs->tag_objects ? &tag_count : NULL);
>  	if (max_count >= 0 && max_count < commit_count)
>  		commit_count = max_count;
>
> -	printf("%d\n", commit_count);
> +	printf("%d\n", commit_count + tree_count + blob_count + tag_count);
>  	free_bitmap_index(bitmap_git);
>  	return 0;
>  }
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 6640329ebf..7ba7d294a5 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -74,6 +74,12 @@ rev_list_tests() {
>  		test_cmp expect actual
>  	'
>
> +	test_expect_success "counting objects via bitmap ($state)" '
> +		git rev-list --count --objects HEAD >expect &&
> +		git rev-list --use-bitmap-index --count --objects HEAD >actual &&
> +		test_cmp expect actual
> +	'
> +
>  	test_expect_success "enumerate --objects ($state)" '
>  		git rev-list --objects --use-bitmap-index HEAD >tmp &&
>  		cut -d" " -f1 <tmp >tmp2 &&
> --
> 2.25.0.796.gcc29325708

Your tests look good to me, too.

Thanks,
Taylor
Jeff King Feb. 15, 2020, 6:55 a.m. UTC | #2
On Fri, Feb 14, 2020 at 04:45:55PM -0800, Taylor Blau wrote:

> >  static int try_bitmap_count(struct rev_info *revs)
> >  {
> > -	uint32_t commit_count;
> > +	uint32_t commit_count = 0,
> > +		 tag_count = 0,
> > +		 tree_count = 0,
> > +		 blob_count = 0;
> 
> Hmm, I don't usually see the comma-separated declaration/initialization
> in git.git. Is there a reason you did it here? Not that I really mind
> one way or the other, just interested.

The variables are all related, and all should have the same type. I'd
complain about a patch that did:

  int ret, count;

because there's no logical reason those two variables have the same
type. They just happen to. And putting them both on the same line is
even worse, because it makes a diff changing one of them noisy.

But in the code quoted above, if one of them changes, they would all
(presumably) change. So I think it communicates something to group them
like this. That said, if we had a style guideline that strictly forbade
this (because it's often applied in _bad_ spots), I wouldn't be that
sad to convert it.

> > +	if (revs->max_count >= 0 &&
> > +	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
> 
> An aside unrelated to the patch at hand: the expression
> 
>   (revs->tag_objects || revs->tree_objects || revs->blob_objects)
> 
> does occur in an awful lot of places throughout this file. Do you
> imagine it'd be useful to pull this check out into its own function,
> perhaps as a preparatory patch in a later version of this series?
> 
> I'm also not fussed if you don't think that such a change would be
> useful, it's just an observation I had after seeing this expression a
> few times.

The most amusing part about it is that there's actually no way to
specify one type without the other. Most of the revision code tries to
be respectful of the fact that we might one day change that (though
there are some subtleties; e.g., asking for blobs but not trees would
still need to traverse the trees, but just not show them).

So this really could have been revs->objects. :) But I think given the
effort to treat them individually so far, it would be a step backwards
to switch now.

As far as whether there should be something like:

  int revs_show_objects(struct rev_info *revs)
  {
	return revs->tag_objects || revs->tree_objects || revs->blob_objects;
  }

I don't feel strongly either way. I find doing it inline perfectly
readable, but it's entirely possible my brain has been rotted by years
of looking at the revision code.

-Peff
Junio C Hamano Feb. 16, 2020, 11:36 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

>> > +	uint32_t commit_count = 0,
>> > +		 tag_count = 0,
>> > +		 tree_count = 0,
>> > +		 blob_count = 0;
>> 
>> Hmm, I don't usually see the comma-separated declaration/initialization
>> in git.git. Is there a reason you did it here? Not that I really mind
>> one way or the other, just interested.
>
> The variables are all related, and all should have the same type. I'd
> complain about a patch that did:
>
>   int ret, count;
>
> because there's no logical reason those two variables have the same
> type. They just happen to. And putting them both on the same line is
> even worse, because it makes a diff changing one of them noisy.
>
> But in the code quoted above, if one of them changes, they would all
> (presumably) change. So I think it communicates something to group them
> like this.

I often apply exactly the same criteria as above to my code and
review---since it is not just you (or me), perhaps CodingGuideline
can help other readers, but I am OK to delay documenting it until we
find the third person who has been applying this rule that has not
been spelt out explicitly ;-)

Patch
diff mbox series

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9452123988..70f3207ecc 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -374,7 +374,10 @@  static inline int parse_missing_action_value(const char *value)
 
 static int try_bitmap_count(struct rev_info *revs)
 {
-	uint32_t commit_count;
+	uint32_t commit_count = 0,
+		 tag_count = 0,
+		 tree_count = 0,
+		 blob_count = 0;
 	int max_count;
 	struct bitmap_index *bitmap_git;
 
@@ -389,6 +392,15 @@  static int try_bitmap_count(struct rev_info *revs)
 	if (revs->left_right || revs->cherry_mark)
 		return -1;
 
+	/*
+	 * If we're counting reachable objects, we can't handle a max count of
+	 * commits to traverse, since we don't know which objects go with which
+	 * commit.
+	 */
+	if (revs->max_count >= 0 &&
+	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
+		return -1;
+
 	/*
 	 * This must be saved before doing any walking, since the revision
 	 * machinery will count it down to zero while traversing.
@@ -399,11 +411,14 @@  static int try_bitmap_count(struct rev_info *revs)
 	if (!bitmap_git)
 		return -1;
 
-	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
+	count_bitmap_commit_list(bitmap_git, &commit_count,
+				 revs->tree_objects ? &tree_count : NULL,
+				 revs->blob_objects ? &blob_count : NULL,
+				 revs->tag_objects ? &tag_count : NULL);
 	if (max_count >= 0 && max_count < commit_count)
 		commit_count = max_count;
 
-	printf("%d\n", commit_count);
+	printf("%d\n", commit_count + tree_count + blob_count + tag_count);
 	free_bitmap_index(bitmap_git);
 	return 0;
 }
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 6640329ebf..7ba7d294a5 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -74,6 +74,12 @@  rev_list_tests() {
 		test_cmp expect actual
 	'
 
+	test_expect_success "counting objects via bitmap ($state)" '
+		git rev-list --count --objects HEAD >expect &&
+		git rev-list --use-bitmap-index --count --objects HEAD >actual &&
+		test_cmp expect actual
+	'
+
 	test_expect_success "enumerate --objects ($state)" '
 		git rev-list --objects --use-bitmap-index HEAD >tmp &&
 		cut -d" " -f1 <tmp >tmp2 &&