diff mbox series

[v2,1/2] midx: apply gitconfig to midx repack

Message ID 21c648cc486cf1abee51076d21e55649b1464516.1588758194.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series midx: apply gitconfig to midx repack | expand

Commit Message

Johannes Schindelin via GitGitGadget May 6, 2020, 9:43 a.m. UTC
From: Son Luong Ngoc <sluongng@gmail.com>

Multi-Pack-Index repack is an incremental, repack solutions
that allows user to consolidate multiple packfiles in a non-disruptive
way. However the new packfile could be created without some of the
capabilities of a packfile that is created by calling `git repack`.

This is because with `git repack`, there are configuration that would
enable different flags to be passed down to `git pack-objects` plumbing.

In this patch, I applies those flags into `git multi-pack-index repack`
so that it respect the `repack.*` config series.

Note:
- `repack.packKeptObjects` will be addressed by Derrick Stolee in
the following patch
- `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
requires `--all` to be passed onto `git pack-objects`, which is very
slow. I think it would be nice to have this in a future patch.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Derrick Stolee May 6, 2020, 12:03 p.m. UTC | #1
On 5/6/2020 5:43 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
...
> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

Just my two cents here: the reachability bitmaps are really tied to the
idea of a single pack right now. To create bitmaps, I would currently
suggest using the 'git repack' builtin with the proper options. That
command deletes the multi-pack-index, unfortunately, but it also produces
a single pack and deletes the others (when creating bitmaps).

You are right that the `--all` option required to pack-objects is not
appropriate to add inside `git multi-pack-index repack` as that changes
the pattern. It requires loading all reachable objects, even if they are
not already in packs covered by the multi-pack-index. This at minimum
violates expectations with the --batch-size argument.

Integrating reachability bitmaps more closely with the multi-pack-index
is certainly on our radar, but is a large endeavor.

This new patch looks good to me.

Thanks,
-Stolee
Junio C Hamano May 6, 2020, 5:03 p.m. UTC | #2
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.

It may be clear to you who wrote the patch, but it is quite unclear
to readers how `repack` gets into the picture.  The first sentence
talks about what "git multi-pack-index repack" subcommand.  Unless
you mention that that "git multi-pack-index repack" subcommand calls
"git repack" under the hood in order to create a new packfile, the
second paragraph can be read as if you are pointing out a problem if
the user did

	$ git multi-pack-index repack
	$ git repack

and the explicit "repack" initiated by the user may create a
packfile that is somehow incompatible with what the previous repack
wanted to do, or something like that.

> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.

And this does not help to clear the possible confusion, either.

I think all of the above is clearer if you rewrite the above
(including the title) like so:

    midx: teach "git multi-pack-index repack" honor "git repack" configuration

    When the "repack" subcommand of "git multi-pack-index" command
    creates new packfile(s), it does not call the "git repack"
    command but instead directly calls the "git pack-objects"
    command, and the configuration variables meant for the "git
    repack" command, like "repack.usedaeltabaseoffset", are ignored.

Now the problem description is behind us, let's see the description
of proposed solution.  We write this part in imperative mood, as if
we are giving an order to the codebase to "become like so".  We do
not say "I do X, I do Y".

> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.

    Check the configuration variables used by "git repack" ourselves
    and pass the corresponding options to underlying "git pack-objects"
    in this codepath.


> Note:
> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
> the following patch

This definitely does not belong to the commit log message.  It would
make a helpful note meant for the reviewers if written below the
three-dash line, though.

> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

The phrasing makes it hard to grok.  Do you want to say that the
repack.writeBitmaps configuration variable is ignored?

I think Derrick gave you the reason why bitmaps is not compatible
with midx in general, and that would be a better rationale to record
why the configuration is ignored.  Perhaps like

    Note that `repack.writeBitmaps` configuration is ignored, as the
    pack bitmap faciility is useful only with a single packfile.

or something like that?

Do we need to worry about the configuration variables understood by
the "git pack-objects" command to get in the way, by the way?
"pack.packsizelimit" may cause "git repack" to produce more than one
packfile, and if this codepath wants to avoid it (I do not know if
that is the case), it may have to override it from the command line,
for example.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>  midx.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..3348f8e569b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	struct child_process cmd = CHILD_PROCESS_INIT;
>  	struct strbuf base_name = STRBUF_INIT;
>  	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
> +	int delta_base_offset = 1;

By default we use delta-base-offset, so if repo_config_get_bool()
did not see the repack.usedeltabaseoffset configuration defined in
any configuration file, we still want to see 1 after it returns.

> +	int use_delta_islands;

What is the reason why it is safe to leave this uninitialized?  Did
you mean 

	int use_delta_islands = 0;

here?

> @@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	} else if (fill_included_packs_all(m, include_pack))
>  		goto cleanup;
>  
> +	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
> +	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
> +
>  	argv_array_push(&cmd.args, "pack-objects");
>  
>  	strbuf_addstr(&base_name, object_dir);
>  	strbuf_addstr(&base_name, "/pack/pack");
>  	argv_array_push(&cmd.args, base_name.buf);
>  
> +	if (delta_base_offset)
> +		argv_array_push(&cmd.args, "--delta-base-offset");
> +	if (use_delta_islands)
> +		argv_array_push(&cmd.args, "--delta-islands");
> +

These look like good changes.

>  	if (flags & MIDX_PROGRESS)
>  		argv_array_push(&cmd.args, "--progress");
>  	else

Thanks.
Son Luong Ngoc May 7, 2020, 7:29 a.m. UTC | #3
Hi Junio,

Thanks for the feedbacks

> On May 6, 2020, at 19:03, Junio C Hamano <gitster@pobox.com> wrote:

...
> We write this part in imperative mood, as if
> we are giving an order to the codebase to "become like so".  We do
> not say "I do X, I do Y".

This is a great feedback.
I will try to include all of your suggestions and edit the message
before submitting V3.

>> Note:
>> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
>> the following patch
> 
> This definitely does not belong to the commit log message.  It would
> make a helpful note meant for the reviewers if written below the
> three-dash line, though.

Duly noted.

> Do we need to worry about the configuration variables understood by
> the "git pack-objects" command to get in the way, by the way?
> "pack.packsizelimit" may cause "git repack" to produce more than one
> packfile, and if this codepath wants to avoid it (I do not know if
> that is the case), it may have to override it from the command line,
> for example.

I dont think we want to avoid the packsizelimit here.
The point of repacking with midx is to help
end users consolidate multiple packfile in a non-disruptive way.

If you wish to put a constraint (i.e. packsizelimit, packKeptObjects) on this process,
you should be able to.

>> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
>> ---
>> midx.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>> 
>> diff --git a/midx.c b/midx.c
>> index 9a61d3b37d9..3348f8e569b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> 	struct child_process cmd = CHILD_PROCESS_INIT;
>> 	struct strbuf base_name = STRBUF_INIT;
>> 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
>> +	int delta_base_offset = 1;
> 
> By default we use delta-base-offset, so if repo_config_get_bool()
> did not see the repack.usedeltabaseoffset configuration defined in
> any configuration file, we still want to see 1 after it returns.
> 
>> +	int use_delta_islands;
> 
> What is the reason why it is safe to leave this uninitialized?  Did
> you mean 
> 
> 	int use_delta_islands = 0;
> 
> here?

I think I totally misread how repo_config_get_bool() supposed to work
Your comment here made me re-read it and things got a lot clearer.

Will set the default value to 0 in next version.

> Thanks.

Much appreciate,
Son Luong.
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..3348f8e569b 100644
--- a/midx.c
+++ b/midx.c
@@ -1369,6 +1369,8 @@  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	int delta_base_offset = 1;
+	int use_delta_islands;
 
 	if (!m)
 		return 0;
@@ -1381,12 +1383,20 @@  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else