Message ID | 20210120124514.49737-2-jacob@gitlab.com (mailing list archive) |
---|---|
State | Accepted |
Commit | be18153b975844f8792b03e337f1a4c86fe87531 |
Headers | show |
Series | builtin/pack-objects.c: avoid iterating all refs | expand |
I also spent time being confused about what is going on, and wondering if that other ref iteration ever worked. I went as far as inspecting git-verify-pack -v output to look at object order. :) On the other hand, through working on this I learned that include-tag has pretty effective test coverage so if peel_ref didn't work or stopped working, we'd find out. If there is a better way to write "for each tag + peel ref" I am happy to change the patch, just let me know what it should look like. Best regards, Jacob Vosmaer GitLab, Inc. On Wed, Jan 20, 2021 at 5:19 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Wed, Jan 20, 2021 at 11:18:11AM -0500, Jeff King wrote: > > So I think both the existing and the new calls using for_each_tag_ref() > > are OK here. > > Indeed, I followed the same trail of calls as you did and reached the > same conclusion, but didn't write any of it down here since I thought it > wasn't worthwhile. > > But, yes, I agree that both are safe. > > Thanks, > Taylor
On Wed, Jan 20, 2021 at 11:19:41AM -0500, Taylor Blau wrote: > On Wed, Jan 20, 2021 at 11:18:11AM -0500, Jeff King wrote: > > So I think both the existing and the new calls using for_each_tag_ref() > > are OK here. > > Indeed, I followed the same trail of calls as you did and reached the > same conclusion, but didn't write any of it down here since I thought it > wasn't worthwhile. > > But, yes, I agree that both are safe. OK. I think it's worth saying such things (in case that was not obvious. ;) ). I also rage-replaced peel_ref() with a function taking an oid so we never have to do that digging again. Posted separately in its own thread. -Peff
On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote: > I also rage-replaced peel_ref() with a function taking an oid so we > never have to do that digging again. Posted separately in its own > thread. That sounds like a good solution. Where does that leave my patch? Do I need to wait for that new commit somehow? I don't know how that works, "contributing to Git" wise.
On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote: > On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote: > > > I also rage-replaced peel_ref() with a function taking an oid so we > > never have to do that digging again. Posted separately in its own > > thread. > > That sounds like a good solution. Where does that leave my patch? Do I > need to wait for that new commit somehow? I don't know how that works, > "contributing to Git" wise. Peff noted in the patch to nuke "peel_ref()" that merging his and then yours will cause a merge conflict, but it is trivial to resolve. So, I'd expect that Junio will pick up both and resolve the merge conflict when queuing. Or, in other words, I don't think there's anything left on your part ;-). Thanks for providing your patches; there was a nice digression and I'm quite happy with where we ended up. Thanks, Taylor
On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote: > On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote: > > > I also rage-replaced peel_ref() with a function taking an oid so we > > never have to do that digging again. Posted separately in its own > > thread. > > That sounds like a good solution. Where does that leave my patch? Do I > need to wait for that new commit somehow? I don't know how that works, > "contributing to Git" wise. Nope, you shouldn't need to do anything. The conflict is minimal enough that the two can proceed independently, and the maintainer can resolve it. -Peff
Great, thanks! Best regards, Jacob Vosmaer GitLab, Inc. On Thu, Jan 21, 2021 at 3:54 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jan 20, 2021 at 10:46:29PM +0100, Jacob Vosmaer wrote: > > > On Wed, Jan 20, 2021 at 8:45 PM Jeff King <peff@peff.net> wrote: > > > > > I also rage-replaced peel_ref() with a function taking an oid so we > > > never have to do that digging again. Posted separately in its own > > > thread. > > > > That sounds like a good solution. Where does that leave my patch? Do I > > need to wait for that new commit somehow? I don't know how that works, > > "contributing to Git" wise. > > Nope, you shouldn't need to do anything. The conflict is minimal > enough that the two can proceed independently, and the maintainer can > resolve it. > > -Peff
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2a00358f34..ad52c91bdb 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid) } } -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data) +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data) { struct object_id peeled; - if (starts_with(path, "refs/tags/") && /* is a tag? */ - !peel_ref(path, &peeled) && /* peelable? */ - obj_is_packed(&peeled)) /* object packed? */ + if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled)) add_tag_chain(oid); return 0; } @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } cleanup_preferred_base(); if (include_tag && nr_result) - for_each_ref(add_ref_tag, NULL); + for_each_tag_ref(add_ref_tag, NULL); stop_progress(&progress_state); trace2_region_leave("pack-objects", "enumerate-objects", the_repository);
In git-pack-objects, we iterate over all the tags if the --include-tag option is passed on the command line. For some reason this uses for_each_ref which is expensive if the repo has many refs. We should use for_each_tag_ref instead. Because the add_ref_tag callback will now only visit tags we simplified it a bit. The motivation for this change is that we observed performance issues with a repository on gitlab.com that has 500,000 refs but only 2,000 tags. The fetch traffic on that repo is dominated by CI, and when we changed CI to fetch with 'git fetch --no-tags' we saw a dramatic change in the CPU profile of git-pack-objects. This lead us to this particular ref walk. More details in: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> --- builtin/pack-objects.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)