fetch: only run 'gc' once when fetching multiple remotes
diff mbox series

Message ID 20190619094630.32557-1-pclouds@gmail.com
State New
Headers show
Series
  • fetch: only run 'gc' once when fetching multiple remotes
Related show

Commit Message

Duy Nguyen June 19, 2019, 9:46 a.m. UTC
In multiple remotes mode, git-fetch is launched for n-1 remotes and the
last remote is handled by the current process. Each of these processes
will in turn run 'gc' at the end.

This is not really a problem because even if multiple 'gc --auto' is run
at the same time we still handle it correctly. It does show multiple
"auto packing in the background" messages though. And we may waste some
resources when gc actually runs because we still do some stuff before
checking the lock and moving it to background.

So let's try to avoid that. We should only need one 'gc' run after all
objects and references are added anyway. Add a new option --no-auto-gc
that will be used by those n-1 processes. 'gc --auto' will always run on
the main fetch process (*).

(*) even if we fetch remotes in parallel at some point in future, this
    should still be fine because we should "join" all those processes
    before this step.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/fetch-options.txt |  4 ++++
 builtin/fetch.c                 | 17 +++++++++++------
 t/t5514-fetch-multiple.sh       |  7 +++++--
 3 files changed, 20 insertions(+), 8 deletions(-)

Comments

Jeff King June 19, 2019, 6:59 p.m. UTC | #1
On Wed, Jun 19, 2019 at 04:46:30PM +0700, Nguyễn Thái Ngọc Duy wrote:

> In multiple remotes mode, git-fetch is launched for n-1 remotes and the
> last remote is handled by the current process. Each of these processes
> will in turn run 'gc' at the end.
> 
> This is not really a problem because even if multiple 'gc --auto' is run
> at the same time we still handle it correctly. It does show multiple
> "auto packing in the background" messages though. And we may waste some
> resources when gc actually runs because we still do some stuff before
> checking the lock and moving it to background.
> 
> So let's try to avoid that. We should only need one 'gc' run after all
> objects and references are added anyway. Add a new option --no-auto-gc
> that will be used by those n-1 processes. 'gc --auto' will always run on
> the main fetch process (*).

Yeah, that makes sense.

I was surprised that we needed a new command-line option here, but I
guess the sub-fetch processes really have no idea that they're
subservient to a multi-remote fetch (they do get "--append", but of
course somebody could specify that independently).

Another option would be to just pass "-c gc.auto=0" to the child
processes to inhibit auto-gc. But maybe it makes sense to have a nicer
interface (after all, somebody else could be doing the same "let's do a
bunch of fetches in a row" without using the multi-fetch code).

Though there I kind of wonder if this would apply to other scripted
uses, too. E.g., if I'm doing a bunch of commits, I might want to
inhibit auto-gc and then run it myself at the end. Should we support
"GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc
..." option that could be used here)?

>  Documentation/fetch-options.txt |  4 ++++
>  builtin/fetch.c                 | 17 +++++++++++------
>  t/t5514-fetch-multiple.sh       |  7 +++++--
>  3 files changed, 20 insertions(+), 8 deletions(-)

My musings above aside, the patch looks correct to me.

-Peff
Duy Nguyen June 20, 2019, 10:11 a.m. UTC | #2
On Thu, Jun 20, 2019 at 1:59 AM Jeff King <peff@peff.net> wrote:
> I was surprised that we needed a new command-line option here, but I
> guess the sub-fetch processes really have no idea that they're
> subservient to a multi-remote fetch (they do get "--append", but of
> course somebody could specify that independently).
>
> Another option would be to just pass "-c gc.auto=0" to the child
> processes to inhibit auto-gc. But maybe it makes sense to have a nicer
> interface (after all, somebody else could be doing the same "let's do a
> bunch of fetches in a row" without using the multi-fetch code).

Nah to me -c is much nicer (and flexible too). The only thing I'm not
sure about is whether a user could override it. If fetch.c adds -c
gc.auto=0 automatically, and the user wants auto gc back, will "git -c
gc.auto=non-zero fetch --multiple" still work?

I haven't checked git_config_push_parameter() carefully, but I have an
impression that the parameter order there is "wrong", at least in this
case.

> Though there I kind of wonder if this would apply to other scripted
> uses, too. E.g., if I'm doing a bunch of commits, I might want to
> inhibit auto-gc and then run it myself at the end. Should we support
> "GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc
> ..." option that could be used here)?

export GIT_CONFIG=gc.auto=0 ?
Jeff King June 20, 2019, 10:21 a.m. UTC | #3
On Thu, Jun 20, 2019 at 05:11:03PM +0700, Duy Nguyen wrote:

> > Another option would be to just pass "-c gc.auto=0" to the child
> > processes to inhibit auto-gc. But maybe it makes sense to have a nicer
> > interface (after all, somebody else could be doing the same "let's do a
> > bunch of fetches in a row" without using the multi-fetch code).
> 
> Nah to me -c is much nicer (and flexible too). The only thing I'm not
> sure about is whether a user could override it. If fetch.c adds -c
> gc.auto=0 automatically, and the user wants auto gc back, will "git -c
> gc.auto=non-zero fetch --multiple" still work?
> 
> I haven't checked git_config_push_parameter() carefully, but I have an
> impression that the parameter order there is "wrong", at least in this
> case.

It depends what you're trying to get it to do.

I'd expect that command to turn on auto-gc for just the outer fetch
(i.e., overriding any on-disk disabling of auto-gc), but keep it off for
the child fetches. And I think that is how it would behave: the outer
fetch only sees the config you provided. The inner ones see
"gc.auto=non-zero gc.auto=0-zero" because git_config_push_parameter()
appends. And they accept the latter under last-one-wins rules.

If you expect to be able to re-enable auto-gc in the child fetches, then
I don't think there would be a way to do that.

> > Though there I kind of wonder if this would apply to other scripted
> > uses, too. E.g., if I'm doing a bunch of commits, I might want to
> > inhibit auto-gc and then run it myself at the end. Should we support
> > "GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc
> > ..." option that could be used here)?
> 
> export GIT_CONFIG=gc.auto=0 ?

Almost. The parser for GIT_CONFIG_PARAMETERS is very picky about seeing
single quotes around each item, which makes it a little unfriendly to
use manually. Plus there may be existing values if your script was
invoked as "git -c whatever my-script".

It's probably not that big a deal for a script to just use "git -c
gc.auto" to inhibit auto-gc for each command that needs it.

-Peff

Patch
diff mbox series

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 91c47752ec..592f391298 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -88,6 +88,10 @@  ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-gc::
+	Run `git gc --auto` at the end to perform garbage collection
+	if needed. This is enabled by default.
+
 -p::
 --prune::
 	Before fetching, remove any remote-tracking references that no
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 4ba63d5ac6..6a3c507897 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -48,6 +48,7 @@  static int prune_tags = -1; /* unspecified */
 
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity, deepen_relative;
 static int progress = -1;
+static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
 static int max_children = 1;
 static enum transport_family family;
@@ -169,6 +170,8 @@  static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
+		 N_("run 'gc --auto' after fetching")),
 	OPT_END()
 };
 
@@ -1424,7 +1427,7 @@  static int fetch_multiple(struct string_list *list)
 			return errcode;
 	}
 
-	argv_array_pushl(&argv, "fetch", "--append", NULL);
+	argv_array_pushl(&argv, "fetch", "--append", "--no-auto-gc", NULL);
 	add_options_to_argv(&argv);
 
 	for (i = 0; i < list->nr; i++) {
@@ -1674,11 +1677,13 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	close_all_packs(the_repository->objects);
 
-	argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (verbosity < 0)
-		argv_array_push(&argv_gc_auto, "--quiet");
-	run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD);
-	argv_array_clear(&argv_gc_auto);
+	if (enable_auto_gc) {
+		argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL);
+		if (verbosity < 0)
+			argv_array_push(&argv_gc_auto, "--quiet");
+		run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD);
+		argv_array_clear(&argv_gc_auto);
+	}
 
 	return result;
 }
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index 0030c92e1a..5426d4b5ab 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -105,9 +105,12 @@  test_expect_success 'git fetch --multiple (two remotes)' '
 	 git remote rm origin &&
 	 git remote add one ../one &&
 	 git remote add two ../two &&
-	 git fetch --multiple one two &&
+	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
-	 test_cmp ../expect output)
+	 test_cmp ../expect output &&
+	 grep "built-in: git gc" trace >gc &&
+	 test_line_count = 1 gc
+	)
 '
 
 test_expect_success 'git fetch --multiple (bad remote names)' '