Message ID | pull.68.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | send-pack: set core.warnAmbiguousRefs=false | expand |
On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote: > I've been looking into the performance of git push for very large repos. Our > users are reporting that 60-80% of git push time is spent during the > "Enumerating objects" phase of git pack-objects. > > A git push process runs several processes during its run, but one includes > git send-pack which calls git pack-objects and passes the known have/wants > into stdin using object ids. However, the default setting for > core.warnAmbiguousRefs requires git pack-objects to check for ref names > matching the ref_rev_parse_rules array in refs.c. This means that every > object is triggering at least six "file exists?" queries. > > When there are a lot of refs, this can add up significantly! My PerfView > trace for a simple push measured 3 seconds spent checking these paths. Some of this might be useful in the commit message. :) > The fix for this is simple: set core.warnAmbiguousRefs to false for this > specific call of git pack-objects coming from git send-pack. We don't want > to default it to false for all calls to git pack-objects, as it is valid to > pass ref names instead of object ids. This helps regain these seconds during > a push. I don't think you actually care about the ambiguity check between refs here; you just care about avoiding the ref check when we've seen (and are mostly expecting) a 40-hex sha1. We have a more specific flag for that: warn_on_object_refname_ambiguity. And I think it would be OK to enable that all the time for pack-objects, which is plumbing that does typically expect object names. See prior art in 25fba78d36 (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname ambiguity check with --stdin, 2014-03-12). > Derrick Stolee (1): > send-pack: set core.warnAmbiguousRefs=false > > send-pack.c | 2 ++ > 1 file changed, 2 insertions(+) Whenever I see a change like this to the pack-objects invocation for send-pack, it makes me wonder if upload-pack would want the same thing. It's a moot point if we just set the flag directly in inside pack-objects, though. -Peff
On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote: > > The fix for this is simple: set core.warnAmbiguousRefs to false for this > > specific call of git pack-objects coming from git send-pack. We don't want > > to default it to false for all calls to git pack-objects, as it is valid to > > pass ref names instead of object ids. This helps regain these seconds during > > a push. > > I don't think you actually care about the ambiguity check between refs > here; you just care about avoiding the ref check when we've seen (and > are mostly expecting) a 40-hex sha1. We have a more specific flag for > that: warn_on_object_refname_ambiguity. > > And I think it would be OK to enable that all the time for pack-objects, > which is plumbing that does typically expect object names. See prior art > in 25fba78d36 (cat-file: disable object/refname ambiguity check for > batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname > ambiguity check with --stdin, 2014-03-12). I'd probably do it here: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e50c6cd1ff..d370638a5d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av) struct rev_info revs; char line[1000]; int flags = 0; + int save_warning; repo_init_revisions(the_repository, &revs, NULL); save_commit_buffer = 0; @@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av) /* make sure shallows are read */ is_repository_shallow(the_repository); + save_warning = warn_on_object_refname_ambiguity; + warn_on_object_refname_ambiguity = 0; + while (fgets(line, sizeof(line), stdin) != NULL) { int len = strlen(line); if (len && line[len - 1] == '\n') @@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av) die(_("bad revision '%s'"), line); } + warn_on_object_refname_ambiguity = save_warning; + if (use_bitmap_index && !get_object_list_from_bitmap(&revs)) return; But I'll leave it to you to wrap that up in a patch, since you probably should re-check your timings (which it would be interesting to include in the commit message, if you have reproducible timings). -Peff
On 11/6/2018 2:44 PM, Jeff King wrote: > On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote: > >> I've been looking into the performance of git push for very large repos. Our >> users are reporting that 60-80% of git push time is spent during the >> "Enumerating objects" phase of git pack-objects. >> >> A git push process runs several processes during its run, but one includes >> git send-pack which calls git pack-objects and passes the known have/wants >> into stdin using object ids. However, the default setting for >> core.warnAmbiguousRefs requires git pack-objects to check for ref names >> matching the ref_rev_parse_rules array in refs.c. This means that every >> object is triggering at least six "file exists?" queries. >> >> When there are a lot of refs, this can add up significantly! My PerfView >> trace for a simple push measured 3 seconds spent checking these paths. > Some of this might be useful in the commit message. :) > >> The fix for this is simple: set core.warnAmbiguousRefs to false for this >> specific call of git pack-objects coming from git send-pack. We don't want >> to default it to false for all calls to git pack-objects, as it is valid to >> pass ref names instead of object ids. This helps regain these seconds during >> a push. > I don't think you actually care about the ambiguity check between refs > here; you just care about avoiding the ref check when we've seen (and > are mostly expecting) a 40-hex sha1. We have a more specific flag for > that: warn_on_object_refname_ambiguity. > > And I think it would be OK to enable that all the time for pack-objects, > which is plumbing that does typically expect object names. See prior art > in 25fba78d36 (cat-file: disable object/refname ambiguity check for > batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname > ambiguity check with --stdin, 2014-03-12). Thanks for these pointers. Helps to know there is precedent for shutting down the behavior without relying on "-c" flags. > Whenever I see a change like this to the pack-objects invocation for > send-pack, it makes me wonder if upload-pack would want the same thing. > > It's a moot point if we just set the flag directly in inside > pack-objects, though. I'll send a v2 that does just that. Thanks, -Stolee
On 11/6/2018 2:51 PM, Jeff King wrote: > On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote: > >>> The fix for this is simple: set core.warnAmbiguousRefs to false for this >>> specific call of git pack-objects coming from git send-pack. We don't want >>> to default it to false for all calls to git pack-objects, as it is valid to >>> pass ref names instead of object ids. This helps regain these seconds during >>> a push. >> I don't think you actually care about the ambiguity check between refs >> here; you just care about avoiding the ref check when we've seen (and >> are mostly expecting) a 40-hex sha1. We have a more specific flag for >> that: warn_on_object_refname_ambiguity. >> >> And I think it would be OK to enable that all the time for pack-objects, >> which is plumbing that does typically expect object names. See prior art >> in 25fba78d36 (cat-file: disable object/refname ambiguity check for >> batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname >> ambiguity check with --stdin, 2014-03-12). > I'd probably do it here: > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index e50c6cd1ff..d370638a5d 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av) Scoping the change into get_object_list does make sense. I was doing it a level higher, which is not worth it. I'll reproduce your change here. > struct rev_info revs; > char line[1000]; > int flags = 0; > + int save_warning; > > repo_init_revisions(the_repository, &revs, NULL); > save_commit_buffer = 0; > @@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av) > /* make sure shallows are read */ > is_repository_shallow(the_repository); > > + save_warning = warn_on_object_refname_ambiguity; > + warn_on_object_refname_ambiguity = 0; > + > while (fgets(line, sizeof(line), stdin) != NULL) { > int len = strlen(line); > if (len && line[len - 1] == '\n') > @@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av) > die(_("bad revision '%s'"), line); > } > > + warn_on_object_refname_ambiguity = save_warning; > + > if (use_bitmap_index && !get_object_list_from_bitmap(&revs)) > return; > > > But I'll leave it to you to wrap that up in a patch, since you probably > should re-check your timings (which it would be interesting to include > in the commit message, if you have reproducible timings). The timings change a lot depending on the disk cache and the remote refs, which is unfortunate, but I have measured a three-second improvement. Thanks, -Stolee