diff mbox series

[v2,1/1] builtin/pack-objects.c: avoid iterating all refs

Message ID 20210120124514.49737-2-jacob@gitlab.com (mailing list archive)
State Accepted
Commit be18153b975844f8792b03e337f1a4c86fe87531
Headers show
Series builtin/pack-objects.c: avoid iterating all refs | expand

Commit Message

Jacob Vosmaer Jan. 20, 2021, 12:45 p.m. UTC
In git-pack-objects, we iterate over all the tags if the --include-tag
option is passed on the command line. For some reason this uses
for_each_ref which is expensive if the repo has many refs. We should
use for_each_tag_ref instead.

Because the add_ref_tag callback will now only visit tags we
simplified it a bit.

The motivation for this change is that we observed performance issues
with a repository on gitlab.com that has 500,000 refs but only 2,000
tags. The fetch traffic on that repo is dominated by CI, and when we
changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
change in the CPU profile of git-pack-objects. This lead us to this
particular ref walk. More details in:
https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598

Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
---
 builtin/pack-objects.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Jacob Vosmaer Jan. 20, 2021, 6:49 p.m. UTC | #1
I also spent time being confused about what is going on, and wondering
if that other ref iteration ever worked. I went as far as inspecting
git-verify-pack -v output to look at object order. :)

On the other hand, through working on this I learned that include-tag
has pretty effective test coverage so if peel_ref didn't work or
stopped working, we'd find out.

If there is a better way to write "for each tag + peel ref" I am happy
to change the patch, just let me know what it should look like.

Best regards,

Jacob Vosmaer
GitLab, Inc.

On Wed, Jan 20, 2021 at 5:19 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Jan 20, 2021 at 11:18:11AM -0500, Jeff King wrote:
> > So I think both the existing and the new calls using for_each_tag_ref()
> > are OK here.
>
> Indeed, I followed the same trail of calls as you did and reached the
> same conclusion, but didn't write any of it down here since I thought it
> wasn't worthwhile.
>
> But, yes, I agree that both are safe.
>
> Thanks,
> Taylor
Jeff King Jan. 20, 2021, 7:45 p.m. UTC | #2
On Wed, Jan 20, 2021 at 11:19:41AM -0500, Taylor Blau wrote:

> On Wed, Jan 20, 2021 at 11:18:11AM -0500, Jeff King wrote:
> > So I think both the existing and the new calls using for_each_tag_ref()
> > are OK here.
> 
> Indeed, I followed the same trail of calls as you did and reached the
> same conclusion, but didn't write any of it down here since I thought it
> wasn't worthwhile.
> 
> But, yes, I agree that both are safe.

OK. I think it's worth saying such things (in case that was not obvious.
;) ).

I also rage-replaced peel_ref() with a function taking an oid so we
never have to do that digging again. Posted separately in its own
thread.

-Peff
Jacob Vosmaer Jan. 20, 2021, 9:46 p.m. UTC | #3
On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote:

> I also rage-replaced peel_ref() with a function taking an oid so we
> never have to do that digging again. Posted separately in its own
> thread.

That sounds like a good solution. Where does that leave my patch? Do I
need to wait for that new commit somehow? I don't know how that works,
"contributing to Git" wise.
Taylor Blau Jan. 20, 2021, 9:52 p.m. UTC | #4
On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote:
> On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote:
>
> > I also rage-replaced peel_ref() with a function taking an oid so we
> > never have to do that digging again. Posted separately in its own
> > thread.
>
> That sounds like a good solution. Where does that leave my patch? Do I
> need to wait for that new commit somehow? I don't know how that works,
> "contributing to Git" wise.

Peff noted in the patch to nuke "peel_ref()" that merging his and then
yours will cause a merge conflict, but it is trivial to resolve. So, I'd
expect that Junio will pick up both and resolve the merge conflict when
queuing.

Or, in other words, I don't think there's anything left on your part
;-). Thanks for providing your patches; there was a nice digression and
I'm quite happy with where we ended up.

Thanks,
Taylor
Jeff King Jan. 21, 2021, 2:54 a.m. UTC | #5
On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote:

> On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote:
> 
> > I also rage-replaced peel_ref() with a function taking an oid so we
> > never have to do that digging again. Posted separately in its own
> > thread.
> 
> That sounds like a good solution. Where does that leave my patch? Do I
> need to wait for that new commit somehow? I don't know how that works,
> "contributing to Git" wise.

Nope, you shouldn't need to do anything. The conflict is minimal
enough that the two can proceed independently, and the maintainer can
resolve it.

-Peff
Jacob Vosmaer Jan. 22, 2021, 4:46 p.m. UTC | #6
Great, thanks!

Best regards,

Jacob Vosmaer
GitLab, Inc.

On Thu, Jan 21, 2021 at 3:54 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote:
>
> > On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote:
> >
> > > I also rage-replaced peel_ref() with a function taking an oid so we
> > > never have to do that digging again. Posted separately in its own
> > > thread.
> >
> > That sounds like a good solution. Where does that leave my patch? Do I
> > need to wait for that new commit somehow? I don't know how that works,
> > "contributing to Git" wise.
>
> Nope, you shouldn't need to do anything. The conflict is minimal
> enough that the two can proceed independently, and the maintainer can
> resolve it.
>
> -Peff
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2a00358f34..ad52c91bdb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2803,13 +2803,11 @@  static void add_tag_chain(const struct object_id *oid)
 	}
 }
 
-static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
+static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct object_id peeled;
 
-	if (starts_with(path, "refs/tags/") && /* is a tag? */
-	    !peel_ref(path, &peeled)    && /* peelable? */
-	    obj_is_packed(&peeled)) /* object packed? */
+	if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled))
 		add_tag_chain(oid);
 	return 0;
 }
@@ -3740,7 +3738,7 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	}
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
-		for_each_ref(add_ref_tag, NULL);
+		for_each_tag_ref(add_ref_tag, NULL);
 	stop_progress(&progress_state);
 	trace2_region_leave("pack-objects", "enumerate-objects",
 			    the_repository);