diff mbox series

[v3,6/7] maintenance: use default schedule if not configured

Message ID d833fffe89c94ecf3551c075960ba6d7607e9b17.1601902635.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Maintenance III: Background maintenance | expand

Commit Message

Philippe Blain via GitGitGadget Oct. 5, 2020, 12:57 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

If a user sets any 'maintenance.<task>.schedule' config value, then
they have chosen a specific schedule for themselves and Git should
respect that when running 'git maintenance run --schedule=<frequency>'.

To make this process extremely simple for users, assume a default
schedule when no 'maintenance.<task>.schedule' or '...enabled' config
settings are concretely set. This is only an in-process assumption, so
future versions of Git could adjust this expected schedule.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 15 ++++++++
 builtin/gc.c                      | 58 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh            | 11 +++---
 3 files changed, 80 insertions(+), 4 deletions(-)

Comments

Martin Ågren Oct. 5, 2020, 7:57 p.m. UTC | #1
Hi Stolee,

On Mon, 5 Oct 2020 at 15:07, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> The 'git maintenance (register|start)' subcommands add the current
> repository to the global Git config so maintenance will operate on that
> repository. It does not specify what maintenance should occur or how
> often.

I see that you posted a "how about this?" some days ago. I was offline
for the weekend with some margin on both sides, so I didn't see it until
now. Good that you just went ahead and posted the whole series anyway.

> If a user sets any 'maintenance.<task>.schedule' config value, then
> they have chosen a specific schedule for themselves and Git should
> respect that when running 'git maintenance run --schedule=<frequency>'.
>
> To make this process extremely simple for users, assume a default
> schedule when no 'maintenance.<task>.schedule' or '...enabled' config
> settings are concretely set. This is only an in-process assumption, so
> future versions of Git could adjust this expected schedule.

This obviously makes sense to me. ;-) One thing it does mean though is
that something like this:

  $ git maintenance register
  # Time goes by...
  # Someone said to try this:
  $ git config --add maintenance.commit-graph.schedule hourly
  $ git config --add maintenance.commit-graph.enable true
  # That could have been a no-op, since we were already on
  # such an hourly schedule, but it will effectively turn off
  # all other scheduled tasks. So some time later:
  # -- Why are my fetches so slow all of a sudden? :-(

That could be different if `git maintenance register` would turn on,
say, `maintenance.baseSchedule = standard` where setting those
`maintenance.commit-graph.*` would tweak that "standard" "base
schedule" (in a no-op way as it happens).

> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -37,6 +37,21 @@ register::

Adding some more context manually:

register::
	Initialize Git config values so any scheduled maintenance will
	start running on this repository. This adds the repository to the
	`maintenance.repo` config variable in the current user's global
	config and enables some recommended configuration values for
> 	`maintenance.<task>.schedule`. The tasks that are enabled are safe
> 	for running in the background without disrupting foreground
> 	processes.

The part about "and enables some recommended configuration values"
should probably be in this patch, not an earlier one, and maybe it
shouldn't even be here. With the new approach of this version, this
doesn't really enable some recommended configuration values. Or maybe
it does, I can't make up my mind, nor can I come up with an alternative
formulation.

> ++
> +If your repository has no `maintenance.<task>.schedule` configuration
> +values set, then Git will use a recommended default schedule that performs
> +background maintenance that will not interrupt foreground commands. The
> +default schedule is as follows:
> ++

If you add a line of "--" here...

> +* `gc`: disabled.
> +* `commit-graph`: hourly.
> +* `prefetch`: hourly.
> +* `loose-objects`: daily.
> +* `incremental-repack`: daily.

... and one here, you'll drop some indentation at this point so that
the next paragraph doesn't align with the list above. (See patch below.)

> ++
> +`git maintenance register` will also disable foreground maintenance by
> +setting `maintenance.auto = false` in the current repository. This config
> +setting will remain after a `git maintenance unregister` command.

That last paragraph does belong here. The part about the different
tasks ... maybe. With this new approach of not actually setting any
`schedule`/`enabled` configuration variables, that list doesn't
obviously have its natural home here. Maybe under `--schedule`, which is
where the detection actually happens and the default defaults are
imposed? Or maybe in a separate "CONFIGURATION" section. It could
include config/maintenance.txt, then go on to define the whole fallback
mechanism without having to worry about breaking the reader's flow. (The
way it is now, this `register` command is fairly heavy compared to the
surrounding parts.)

> +static int has_schedule_config(void)
> +{
> +       int i, found = 0;
> +       struct strbuf config_name = STRBUF_INIT;
> +       size_t prefix;
> +
> +       strbuf_addstr(&config_name, "maintenance.");
> +       prefix = config_name.len;
> +
> +       for (i = 0; !found && i < TASK__COUNT; i++) {
> +               char *value;
> +
> +               strbuf_setlen(&config_name, prefix);
> +               strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
> +
> +               if (!git_config_get_string(config_name.buf, &value)) {
> +                       found = 1;
> +                       FREE_AND_NULL(value);
> +               }
> +
> +               strbuf_setlen(&config_name, prefix);
> +               strbuf_addf(&config_name, "%s.enabled", tasks[i].name);
> +
> +               if (!git_config_get_string(config_name.buf, &value)) {
> +                       found = 1;
> +                       FREE_AND_NULL(value);
> +               }
> +       }
> +
> +       strbuf_release(&config_name);
> +       return found;
> +}

I had the same reaction to `FREE_AND_NULL()` as on my previous reading.
If you have $reasons for doing it this way, not a big deal. I offer a
suggestion in patch form below anyway. Feel free to squash, adapt or
ignore as you see fit.

> +
> +static void set_recommended_schedule(void)
> +{
> +       if (has_schedule_config())
> +               return;
> +
> +       tasks[TASK_GC].enabled = 0;
> +
> +       tasks[TASK_PREFETCH].enabled = 1;
> +       tasks[TASK_PREFETCH].schedule = SCHEDULE_HOURLY;
> +
> +       tasks[TASK_COMMIT_GRAPH].enabled = 1;
> +       tasks[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY;
> +
> +       tasks[TASK_LOOSE_OBJECTS].enabled = 1;
> +       tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
> +
> +       tasks[TASK_INCREMENTAL_REPACK].enabled = 1;
> +       tasks[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY;
> +}
> +

One thing I can't make up my mind about is how these `enabled` are used
for two purposes: Deciding what to do on `git maintenance run` without
any `--task`, and deciding what to do on `git maintenance run
--scheduled`.

>  static int maintenance_run_tasks(struct maintenance_run_opts *opts)
>  {
>         int i, found_selected = 0;
> @@ -1280,6 +1333,8 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
>
>         if (found_selected)
>                 QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
> +       else if (opts->schedule != SCHEDULE_NONE)
> +               set_recommended_schedule();

... And especially how we only impose the magic
`maintenance.<task>.enabled` values when we are running with
`--schedule`. So the answer to "what is the default value of
`maintenance.commit-graph.enabled`?" is "it depends on several factors".

Sort of related: The presence of a `maintenance.<task>.schedule` is not
sufficient to schedule the task. This looks like something that one
could easily trip on. Maybe you have already considered letting a zero
value for `maintenance.<task>.schedule` mean "disabled" and ignoring the
`enabled` config item for the scheduled runs, but rejected that for good
reasons?

> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 7715e40391..7154987fd2 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -305,11 +305,14 @@ test_expect_success 'register and unregister' '
>         git config --global --add maintenance.repo /existing1 &&
>         git config --global --add maintenance.repo /existing2 &&
>         git config --global --get-all maintenance.repo >before &&
> +
>         git maintenance register &&
> -       git config --global --get-all maintenance.repo >actual &&
> -       cp before after &&
> -       pwd >>after &&
> -       test_cmp after actual &&
> +       test_cmp_config false maintenance.auto &&
> +       git config --global --get-all maintenance.repo >between &&
> +       cp before expect &&
> +       pwd >>expect &&
> +       test_cmp expect between &&
> +
>         git maintenance unregister &&
>         git config --global --get-all maintenance.repo >actual &&
>         test_cmp before actual

This tests the one-time config tweaking. But we don't test any of the
"detect no config and impose a default" logic. Neither that it kicks in
at all, nor that it doesn't when it shouldn't.

As mentioned above, I end with some minor suggestions.

Martin

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 738a4c7ebd..2085b53dc5 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -43,11 +43,13 @@ values set, then Git will use a recommended default schedule that performs
 background maintenance that will not interrupt foreground commands. The
 default schedule is as follows:
 +
+--
 * `gc`: disabled.
 * `commit-graph`: hourly.
 * `prefetch`: hourly.
 * `loose-objects`: daily.
 * `incremental-repack`: daily.
+--
 +
 `git maintenance register` will also disable foreground maintenance by
 setting `maintenance.auto = false` in the current repository. This config
diff --git a/builtin/gc.c b/builtin/gc.c
index 965690704b..63f4c102b1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1253,35 +1253,31 @@ static int compare_tasks_by_selection(const void *a_, const void *b_)
 
 static int has_schedule_config(void)
 {
-	int i, found = 0;
+	int i;
 	struct strbuf config_name = STRBUF_INIT;
 	size_t prefix;
+	char *value = NULL;
 
 	strbuf_addstr(&config_name, "maintenance.");
 	prefix = config_name.len;
 
-	for (i = 0; !found && i < TASK__COUNT; i++) {
-		char *value;
-
+	for (i = 0; i < TASK__COUNT; i++) {
 		strbuf_setlen(&config_name, prefix);
 		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
 
-		if (!git_config_get_string(config_name.buf, &value)) {
-			found = 1;
-			FREE_AND_NULL(value);
-		}
+		if (!git_config_get_string(config_name.buf, &value))
+			break;
 
 		strbuf_setlen(&config_name, prefix);
 		strbuf_addf(&config_name, "%s.enabled", tasks[i].name);
 
-		if (!git_config_get_string(config_name.buf, &value)) {
-			found = 1;
-			FREE_AND_NULL(value);
-		}
+		if (!git_config_get_string(config_name.buf, &value))
+			break;
 	}
 
 	strbuf_release(&config_name);
-	return found;
+	free(value);
+	return i < TASK__COUNT;
 }
 
 static void set_recommended_schedule(void)
Derrick Stolee Oct. 8, 2020, 1:32 p.m. UTC | #2
On 10/5/2020 3:57 PM, Martin Ågren wrote:
> On Mon, 5 Oct 2020 at 15:07, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> To make this process extremely simple for users, assume a default
>> schedule when no 'maintenance.<task>.schedule' or '...enabled' config
>> settings are concretely set. This is only an in-process assumption, so
>> future versions of Git could adjust this expected schedule.
> 
> This obviously makes sense to me. ;-) One thing it does mean though is
> that something like this:
> 
>   $ git maintenance register
>   # Time goes by...
>   # Someone said to try this:
>   $ git config --add maintenance.commit-graph.schedule hourly
>   $ git config --add maintenance.commit-graph.enable true
>   # That could have been a no-op, since we were already on
>   # such an hourly schedule, but it will effectively turn off
>   # all other scheduled tasks. So some time later:
>   # -- Why are my fetches so slow all of a sudden? :-(
> 
> That could be different if `git maintenance register` would turn on,
> say, `maintenance.baseSchedule = standard` where setting those
> `maintenance.commit-graph.*` would tweak that "standard" "base
> schedule" (in a no-op way as it happens).

Thank you so much for your detailed feedback! This is an excellent
point and I will be sure to account for it when I have time to
carefully examine the options and these kind of workflows.

Right now, I'm a bit underwater getting ready for the v2.29.0
release (in microsoft/git, Scalar, and VFS for Git) but I will
revisit this as my main focus after this release cycle. I have
not forgotten about this topic!!!

> As mentioned above, I end with some minor suggestions.

I really appreciate the effort you put in to this fixup.
Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 7628a6d157..52fff86844 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -37,6 +37,21 @@  register::
 	`maintenance.<task>.schedule`. The tasks that are enabled are safe
 	for running in the background without disrupting foreground
 	processes.
++
+If your repository has no `maintenance.<task>.schedule` configuration
+values set, then Git will use a recommended default schedule that performs
+background maintenance that will not interrupt foreground commands. The
+default schedule is as follows:
++
+* `gc`: disabled.
+* `commit-graph`: hourly.
+* `prefetch`: hourly.
+* `loose-objects`: daily.
+* `incremental-repack`: daily.
++
+`git maintenance register` will also disable foreground maintenance by
+setting `maintenance.auto = false` in the current repository. This config
+setting will remain after a `git maintenance unregister` command.
 
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
diff --git a/builtin/gc.c b/builtin/gc.c
index a387f46585..965690704b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1251,6 +1251,59 @@  static int compare_tasks_by_selection(const void *a_, const void *b_)
 	return b->selected_order - a->selected_order;
 }
 
+static int has_schedule_config(void)
+{
+	int i, found = 0;
+	struct strbuf config_name = STRBUF_INIT;
+	size_t prefix;
+
+	strbuf_addstr(&config_name, "maintenance.");
+	prefix = config_name.len;
+
+	for (i = 0; !found && i < TASK__COUNT; i++) {
+		char *value;
+
+		strbuf_setlen(&config_name, prefix);
+		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &value)) {
+			found = 1;
+			FREE_AND_NULL(value);
+		}
+
+		strbuf_setlen(&config_name, prefix);
+		strbuf_addf(&config_name, "%s.enabled", tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &value)) {
+			found = 1;
+			FREE_AND_NULL(value);
+		}
+	}
+
+	strbuf_release(&config_name);
+	return found;
+}
+
+static void set_recommended_schedule(void)
+{
+	if (has_schedule_config())
+		return;
+
+	tasks[TASK_GC].enabled = 0;
+
+	tasks[TASK_PREFETCH].enabled = 1;
+	tasks[TASK_PREFETCH].schedule = SCHEDULE_HOURLY;
+
+	tasks[TASK_COMMIT_GRAPH].enabled = 1;
+	tasks[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY;
+
+	tasks[TASK_LOOSE_OBJECTS].enabled = 1;
+	tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
+
+	tasks[TASK_INCREMENTAL_REPACK].enabled = 1;
+	tasks[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY;
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
@@ -1280,6 +1333,8 @@  static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 
 	if (found_selected)
 		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+	else if (opts->schedule != SCHEDULE_NONE)
+		set_recommended_schedule();
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		if (found_selected && tasks[i].selected_order < 0)
@@ -1417,6 +1472,9 @@  static int maintenance_register(void)
 	if (!the_repository || !the_repository->gitdir)
 		return 0;
 
+	/* Disable foreground maintenance */
+	git_config_set("maintenance.auto", "false");
+
 	config_get.git_cmd = 1;
 	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
 		     the_repository->worktree ? the_repository->worktree
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 7715e40391..7154987fd2 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -305,11 +305,14 @@  test_expect_success 'register and unregister' '
 	git config --global --add maintenance.repo /existing1 &&
 	git config --global --add maintenance.repo /existing2 &&
 	git config --global --get-all maintenance.repo >before &&
+
 	git maintenance register &&
-	git config --global --get-all maintenance.repo >actual &&
-	cp before after &&
-	pwd >>after &&
-	test_cmp after actual &&
+	test_cmp_config false maintenance.auto &&
+	git config --global --get-all maintenance.repo >between &&
+	cp before expect &&
+	pwd >>expect &&
+	test_cmp expect between &&
+
 	git maintenance unregister &&
 	git config --global --get-all maintenance.repo >actual &&
 	test_cmp before actual