diff mbox series

[v4,6/7] fetch: after refetch, encourage auto gc repacking

Message ID 28c07219fd830196af1171320b86bc2a58ba3d79.1648476132.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 7390f05a3c674e354ba2f52632046fa0a5c3e501
Headers show
Series fetch: add repair: full refetch without negotiation (was: "refiltering") | expand

Commit Message

Robert Coup March 28, 2022, 2:02 p.m. UTC
From: Robert Coup <robert@coup.net.nz>

After invoking `fetch --refetch`, the object db will likely contain many
duplicate objects. If auto-maintenance is enabled, invoke it with
appropriate settings to encourage repacking/consolidation.

* gc.autoPackLimit: unless this is set to 0 (disabled), override the
  value to 1 to force pack consolidation.
* maintenance.incremental-repack.auto: unless this is set to 0, override
  the value to -1 to force incremental repacking.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  3 ++-
 builtin/fetch.c                 | 19 ++++++++++++++++++-
 t/t5616-partial-clone.sh        | 29 +++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 2 deletions(-)

Comments

Ævar Arnfjörð Bjarmason March 31, 2022, 3:22 p.m. UTC | #1
On Mon, Mar 28 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> After invoking `fetch --refetch`, the object db will likely contain many
> duplicate objects. If auto-maintenance is enabled, invoke it with
> appropriate settings to encourage repacking/consolidation.
>
> * gc.autoPackLimit: unless this is set to 0 (disabled), override the
>   value to 1 to force pack consolidation.
> * maintenance.incremental-repack.auto: unless this is set to 0, override
>   the value to -1 to force incremental repacking.
>
> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  Documentation/fetch-options.txt |  3 ++-
>  builtin/fetch.c                 | 19 ++++++++++++++++++-
>  t/t5616-partial-clone.sh        | 29 +++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index d03fce5aae0..622bd84768b 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -169,7 +169,8 @@ ifndef::git-pull[]
>  	associated objects that are already present locally, this option fetches
>  	all objects as a fresh clone would. Use this to reapply a partial clone
>  	filter from configuration or using `--filter=` when the filter
> -	definition has changed.
> +	definition has changed. Automatic post-fetch maintenance will perform
> +	object database pack consolidation to remove any duplicate objects.
>  endif::git-pull[]
>  
>  --refmap=<refspec>::
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e391a5dbc55..e3791f09ed5 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2306,8 +2306,25 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  					     NULL);
>  	}
>  
> -	if (enable_auto_gc)
> +	if (enable_auto_gc) {
> +		if (refetch) {
> +			/*
> +			 * Hint auto-maintenance strongly to encourage repacking,
> +			 * but respect config settings disabling it.
> +			 */
> +			int opt_val;

nit: add a \n after this.

> +			if (git_config_get_int("gc.autopacklimit", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)

nit: don't compare against 0 or null,  just !opt_val

Isn't this whole thing also clearer as:

	int &forget;

        if (git_conf...(..., &forget))
		git_config_push_parameter("gc.autoPackLimit=1");

Maybe I haven't eyeballed this enough, but aren't you ignoring explicit
gc.autoPackLimit=0 configuration? Whereas what you seem to want is "set
this config unlress the user has it set", for which we only need to
check the git_config...(...) return value, no?

> +				git_config_push_parameter("gc.autoPackLimit=1");
> +
> +			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)
> +				git_config_push_parameter("maintenance.incremental-repack.auto=-1");

hrm, do we really need to set both of these these days (not saying we
don't, just surprised). I.e. both gc.* an maintenance.* config.

*skims the code*

Urgh, yes? too_many_packs() seems to check gc.* only, but
incremental_repack_auto_condition() check this variable... :(

> +test_expect_success 'fetch --refetch triggers repacking' '
> +	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&

Nit: Can we use GIT_CONFIG_KEY_* et al for this these days, or do we
still need this trace2 thingy?

> +	export GIT_TRACE2_CONFIG_PARAMS &&
> +
Robert Coup April 1, 2022, 10:51 a.m. UTC | #2
Hi Ævar,

On Thu, 31 Mar 2022 at 16:33, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:


> > +                     if (git_config_get_int("gc.autopacklimit", &opt_val))
> > +                             opt_val = -1;
> > +                     if (opt_val != 0)
>
> nit: don't compare against 0 or null,  just !opt_val

I did this since 0 has a specific meaning ("Setting this to 0
disables"), it's not just false-y in this context. Tomayto, tomahto?

>
> Isn't this whole thing also clearer as:
>
>         int &forget;
>
>         if (git_conf...(..., &forget))
>                 git_config_push_parameter("gc.autoPackLimit=1");
>
> Maybe I haven't eyeballed this enough, but aren't you ignoring explicit
> gc.autoPackLimit=0 configuration? Whereas what you seem to want is "set
> this config unlress the user has it set", for which we only need to
> check the git_config...(...) return value, no?

What I'm trying to achieve: if the user has not disabled auto-packing
(autoPackLimit=0), then pass autoPackLimit=1 to the subprocess to
encourage repacking.
Context/why: so we don't 2x the object store size and not even attempt
to repack it now, rather than at some unspecified point in the future.
Maybe.

How the code achieves it:
  load autoPackLimit into opt_val
  if autoPackLimit is not specified in config: set opt_val to -1
  if opt_val is not 0: pass autoPackLimit=1 to the subprocess

AFAICT if we just if(git_config_get_int()) then if they haven't set it
at all in config, we wouldn't encourage repacking in the subprocess.
Which isn't what I'm trying to achieve.

> hrm, do we really need to set both of these these days (not saying we
> don't, just surprised). I.e. both gc.* an maintenance.* config.
>
> *skims the code*
>
> Urgh, yes? too_many_packs() seems to check gc.* only, but
> incremental_repack_auto_condition() check this variable... :(

Yes.

>
> > +test_expect_success 'fetch --refetch triggers repacking' '
> > +     GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
>
> Nit: Can we use GIT_CONFIG_KEY_* et al for this these days, or do we
> still need this trace2 thingy?

I copied a pattern existing tests are using.

Thanks, Rob.
diff mbox series

Patch

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index d03fce5aae0..622bd84768b 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -169,7 +169,8 @@  ifndef::git-pull[]
 	associated objects that are already present locally, this option fetches
 	all objects as a fresh clone would. Use this to reapply a partial clone
 	filter from configuration or using `--filter=` when the filter
-	definition has changed.
+	definition has changed. Automatic post-fetch maintenance will perform
+	object database pack consolidation to remove any duplicate objects.
 endif::git-pull[]
 
 --refmap=<refspec>::
diff --git a/builtin/fetch.c b/builtin/fetch.c
index e391a5dbc55..e3791f09ed5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2306,8 +2306,25 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 					     NULL);
 	}
 
-	if (enable_auto_gc)
+	if (enable_auto_gc) {
+		if (refetch) {
+			/*
+			 * Hint auto-maintenance strongly to encourage repacking,
+			 * but respect config settings disabling it.
+			 */
+			int opt_val;
+			if (git_config_get_int("gc.autopacklimit", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("gc.autoPackLimit=1");
+
+			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("maintenance.incremental-repack.auto=-1");
+		}
 		run_auto_maintenance(verbosity < 0);
+	}
 
  cleanup:
 	string_list_clear(&list, 0);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 87ebf4b0b1c..4a3778d04a8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -216,6 +216,35 @@  test_expect_success 'fetch --refetch works with a shallow clone' '
 	test_line_count = 6 observed
 '
 
+test_expect_success 'fetch --refetch triggers repacking' '
+	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
+	export GIT_TRACE2_CONFIG_PARAMS &&
+
+	GIT_TRACE2_EVENT="$PWD/trace1.event" \
+	git -C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace1.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace2.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=0 \
+		-c maintenance.incremental-repack.auto=1234 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace2.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace3.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=1234 \
+		-c maintenance.incremental-repack.auto=0 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace3.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&