diff mbox series

send-pack.c: add config push.useBitmaps

Message ID pull.1263.git.1655291320433.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series send-pack.c: add config push.useBitmaps | expand

Commit Message

Kyle Zhao June 15, 2022, 11:08 a.m. UTC
From: Kyle Zhao <kylezhao@tencent.com>

This allows you to disabled bitmaps for "git push". Default is false.

Bitmaps are designed to speed up the "counting objects" phase of
subsequent packs created for clones and fetches.
But in some cases, turning bitmaps on does horrible things for "push"
performance[1].

[1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
---
    send-pack.c: add config push.useBitmaps
    
    This patch add config push.useBitmaps to prevent git push using bitmap.
    
    The origin discussion is here:
    https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
    
    Thanks, Kyle

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1263%2Fkeyu98%2Fkz%2Fpush-usebitmps-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1263/keyu98/kz/push-usebitmps-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1263

 Documentation/config/push.txt |  4 ++++
 send-pack.c                   |  6 ++++++
 send-pack.h                   |  3 ++-
 t/t5516-fetch-push.sh         | 11 +++++++++++
 4 files changed, 23 insertions(+), 1 deletion(-)


base-commit: 8168d5e9c23ed44ae3d604f392320d66556453c9

Comments

Derrick Stolee June 15, 2022, 1:09 p.m. UTC | #1
On 6/15/22 7:08 AM, Kyle Zhao via GitGitGadget wrote:
> From: Kyle Zhao <kylezhao@tencent.com>
> 
> This allows you to disabled bitmaps for "git push". Default is false.

s/disabled/disable/

> Bitmaps are designed to speed up the "counting objects" phase of
> subsequent packs created for clones and fetches.
> But in some cases, turning bitmaps on does horrible things for "push"
> performance[1].

I would rephrase this message body as follows:

  Reachability bitmaps are designed to speed up the "counting objects"
  phase of generating a pack during a clone or fetch. They are not
  optimized for Git clients sending a small topic branch via "git push".
  In some cases (see [1]), using reachability bitmaps during "git push"
  can cause significant performance regressions.

  Add a new "push.useBitmaps" config option to disable reachability
  bitmaps during "git push". This allows reachability bitmaps to still
  be used in other areas, such as "git rev-list --use-bitmap-index".

> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

> +
> +push.useBitmaps::
> +	If this config and `pack.useBitmaps` are both "true", git will
> +	use pack bitmaps (if available) when git push. Default is false.

Rewording slightly:

  If this config and `pack.useBitmaps` are both `true`, then Git will
  use reachability bitmaps during `git push`, if available (disabled
  by default).

> \ No newline at end of file

Please fix this missing newline.

I'm glad that this references `pack.useBitmaps`. I wonder if that
config is sufficient for your purposes: do you expect to use your
bitmaps to generate pack-files in any other way than "git push"?

That is: are you serving remote requests from your repo with your
bitmaps while also using "git push"? Or, are you using bitmaps
only for things like "git rev-list --use-bitmap-index"?

I just want to compare this with `pack.useSparse` which targets
"git push", but focuses entirely on the pack-creation part of the
operation. Since the existence of reachability bitmaps overrides
the sparse algorithm, this does not affect Git servers (who should
have a reachability bitmap).

I just want to be sure that using pack.useBitmaps=false would not
be sufficient for your situation (and probably most situations).


> diff --git a/send-pack.c b/send-pack.c
> index bc0fcdbb000..d6091571caa 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
>  		strvec_push(&po.args, "--progress");
>  	if (is_repository_shallow(the_repository))
>  		strvec_push(&po.args, "--shallow");
> +	if (!args->use_bitmaps)
> +		strvec_push(&po.args, "--no-use-bitmap-index");

Here is where you specify the lower-level pack creation only
when sending a pack. It is very focused. Good.

> +	int use_bitmaps = 0;

> +	git_config_get_bool("push.usebitmaps", &use_bitmaps);
> +	if (use_bitmaps)
> +		args->use_bitmaps = 1;

You can instead write this as:

	if (!git_config_get_bool("push.usebitmaps", &use_bitmaps))
		args->use_bitmaps = use_bitmaps;

> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index dedca106a7a..ee0b912a5e8 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' '
>  	test_line_count = 1 warnings
>  '
>  
> +test_expect_success 'push with config push.useBitmaps' '
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&

Just use "err" instead of "stderr".

> +	grep "no-use-bitmap-index" stderr &&
> +
> +	git config push.useBitmaps true &&
> +	GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr &&
> +	! grep "no-use-bitmap-index" stderr
> +'

I believe this test is correct, but it might be worth looking
into test_subcommand if you can determine the exact subcommand
arguments you are looking for.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason June 15, 2022, 7:47 p.m. UTC | #2
On Wed, Jun 15 2022, Kyle Zhao via GitGitGadget wrote:

[CC'd Taylor, who's worked a lot on bitmaps]

> From: Kyle Zhao <kylezhao@tencent.com>
>
> This allows you to disabled bitmaps for "git push". Default is false.

Thanks for following up.

Refresh my memory, we always use them if we find them now, correct?
I.e. if repack.writebitmaps=true

This doesn't make it clear: Are we now going to ignore them for "push"
by default, even if they exist? I.e. a change in the current default.

I think I recall from the previous discussions that it was a bit of hit
& miss, maybe we're confident that they're almost (or always?) bad for
"push", but I think there *are* cases where they help.

> Bitmaps are designed to speed up the "counting objects" phase of
> subsequent packs created for clones and fetches.
> But in some cases, turning bitmaps on does horrible things for "push"
> performance[1].
>
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
>
> Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
> ---
>     send-pack.c: add config push.useBitmaps
>     
>     This patch add config push.useBitmaps to prevent git push using bitmap.
>     
>     The origin discussion is here:
>     https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
>     
>     Thanks, Kyle
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1263%2Fkeyu98%2Fkz%2Fpush-usebitmps-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1263/keyu98/kz/push-usebitmps-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1263
>
>  Documentation/config/push.txt |  4 ++++
>  send-pack.c                   |  6 ++++++
>  send-pack.h                   |  3 ++-
>  t/t5516-fetch-push.sh         | 11 +++++++++++
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
> index e32801e6c91..d8fb0bd1414 100644
> --- a/Documentation/config/push.txt
> +++ b/Documentation/config/push.txt
> @@ -137,3 +137,7 @@ push.negotiate::
>  	server attempt to find commits in common. If "false", Git will
>  	rely solely on the server's ref advertisement to find commits
>  	in common.
> +
> +push.useBitmaps::
> +	If this config and `pack.useBitmaps` are both "true", git will
> +	use pack bitmaps (if available) when git push. Default is false.
> \ No newline at end of file

"git diff" is telling you something is wrong here :) hint: missing \n.

> diff --git a/send-pack.c b/send-pack.c
> index bc0fcdbb000..d6091571caa 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
>  		strvec_push(&po.args, "--progress");
>  	if (is_repository_shallow(the_repository))
>  		strvec_push(&po.args, "--shallow");
> +	if (!args->use_bitmaps)
> +		strvec_push(&po.args, "--no-use-bitmap-index");
>  	po.in = -1;
>  	po.out = args->stateless_rpc ? -1 : fd;
>  	po.git_cmd = 1;
> @@ -482,6 +484,7 @@ int send_pack(struct send_pack_args *args,
>  	int use_push_options = 0;
>  	int push_options_supported = 0;
>  	int object_format_supported = 0;
> +	int use_bitmaps = 0;
>  	unsigned cmds_sent = 0;
>  	int ret;
>  	struct async demux;
> @@ -497,6 +500,9 @@ int send_pack(struct send_pack_args *args,
>  	git_config_get_bool("push.negotiate", &push_negotiate);
>  	if (push_negotiate)
>  		get_commons_through_negotiation(args->url, remote_refs, &commons);
> +	git_config_get_bool("push.usebitmaps", &use_bitmaps);
> +	if (use_bitmaps)
> +		args->use_bitmaps = 1;

[A bit rambling, sorry]

A bit off of a use of the API, can't we just do:

	git_config_get_bool("push.usebitmaps", &args->use_bitmaps);

And drop the local variable? I.e. if we have a config variable we'll
write the value to it.

But then again that goes with the suggested default, i.e. I think we
should probably use them by default, and provide an out, unless we're
*really* sure they're useless for "push".

In this case I found the code a bit odd, which took me a moment to put
my finger on.

In builtin/send-pack.c we initialize "struct send_pack_args" in the file
scope, so it's zero'd out.

So all parameters are 0'd by default.

So your new "use_bitmaps" is born false.

Then when we get here you assign 0 to use_bitmaps, and only if it's true
do you use it.

Which to me is a couple of layers in to something that's less clear than
it could be, i.e. we're making the hard assumption here that the default
in the struct is false. So we should really just do:

	int tmp;
	if (!git_config_get_bool("push.usebitmaps", &tmp))
		args->use_bitmaps = tmp;

So don't care what the default was before, we have an explicit config
variable we're going to use, if we saw push.usebitmaps let's use its
value.

This way the default can flip, and this code downstream of that will
still do what's intended.

FWIW that "if" is redundant, but it's the general idiom, but we *can* do
it the way I suggested above, but I think it's clearer to go with the
second form, which reads more obviously as "if the variable exists, set
it to ...".

For config.c in particular the "without the if" works, but I think
that's relying on an implementation detail, i.e. we have a few other
APIs that "zero out" the parameter you put in as the first thing they
do. So if they also return "I didn't error" you want to use a temp
variable, and only assign it to your "real" variable if the return value
was "OK".

>  
>  	git_config_get_bool("transfer.advertisesid", &advertise_sid);
>  
> diff --git a/send-pack.h b/send-pack.h
> index e148fcd9609..f7af1b0353e 100644
> --- a/send-pack.h
> +++ b/send-pack.h
> @@ -26,7 +26,8 @@ struct send_pack_args {
>  		/* One of the SEND_PACK_PUSH_CERT_* constants. */
>  		push_cert:2,
>  		stateless_rpc:1,
> -		atomic:1;
> +		atomic:1,
> +		use_bitmaps:1;
>  	const struct string_list *push_options;
>  };
>  
> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index dedca106a7a..ee0b912a5e8 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' '
>  	test_line_count = 1 warnings
>  '
>  
> +test_expect_success 'push with config push.useBitmaps' '
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&

We generally don't >/dev/null in tests, just emit the output, and if we
run with -v you'll see it.

In this case though you want just:

    GIT_TRACE="$PWD/trace.log" [...]

Without any redirection,

Also, you probably want GIT_TRACE2_EVENT=$PWD/trace.json, and see
"test_subcommand". We have a handy helper for finding if we have trace2
regions.

I'm assuming we have some region for "used bitmaps" already, but I
didn't check...
Taylor Blau June 15, 2022, 9:24 p.m. UTC | #3
On Wed, Jun 15, 2022 at 09:09:35AM -0400, Derrick Stolee wrote:
> I just want to be sure that using pack.useBitmaps=false would not
> be sufficient for your situation (and probably most situations).

I think the only other affected scenario on the client side would be
repacking. And in theory most clients are repacking in the background
anyways, so any speed-ups from using bitmaps wouldn't be noticeable
anyway.

I think just relying on the existing pack.useBitmaps config should be
sufficient here.

Thanks,
Taylor
Taylor Blau June 15, 2022, 9:32 p.m. UTC | #4
On Wed, Jun 15, 2022 at 09:47:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refresh my memory, we always use them if we find them now, correct?
> I.e. if repack.writebitmaps=true

Kind of. `rev-list` has an opt-in `--use-bitmap-index` option, and
`pack-objects` has `pack.useBitmaps` which controls whether or not
bitmaps are read.

So `git rev-list --objects HEAD` won't use any bitmaps, even if they
exist, but `git rev-list --objects --use-bitmap-index HEAD` will. There
there's a good reason not to use bitmaps by default, which is that they
(effectively) randomize the ordering of your output.

`pack-objects` behaves slightly differently, cf. its
`use_bitmap_index{,default}` variables to see how that works.

> This doesn't make it clear: Are we now going to ignore them for "push"
> by default, even if they exist? I.e. a change in the current default.
>
> I think I recall from the previous discussions that it was a bit of hit
> & miss, maybe we're confident that they're almost (or always?) bad for
> "push", but I think there *are* cases where they help.

Yeah. In theory they should help most of the time as long as the bitmaps
are kept reasonably up-to-date. There's a tradeoff between using bitmaps
and how much walking between bitmap and ref-tips is required. Since
every object we encounter between the most recent bitmap and the thing
we're trying to generate a bitmap for has to incur extra overhead in
order to find and mark its position in the bitmap being generated,
there's a certain point at which it would have been faster to just walk
down to the roots and avoid bitmaps altogether.

But I suspect that in this case the bitmaps are just simply stale, and
that a "git repack -adb" or more aggressive background maintenance would
make things quite a bit faster.

Thanks,
Taylor
Kyle Zhao June 16, 2022, 2:13 a.m. UTC | #5
> > I'm glad that this references `pack.useBitmaps`. I wonder if that config is sufficient for your purposes: do you expect to use your bitmaps to generate pack-files in any other way > > I just want to be sure that using pack.useBitmaps=false would not be 
> > sufficient for your situation (and probably most situations).
> 
> I think the only other affected scenario on the client side would be repacking. And in theory most clients are repacking in the background anyways, so any speed-ups from using bitmaps wouldn't be noticeable anyway.
> 
> I think just relying on the existing pack.useBitmaps config should be sufficient here.

In fact ,we also use "git push" on our server side. 
Each git repositories have multiple replicas on our servers to improve system disaster tolerance and read performance.

For example, a git repo will be distributed on multiple servers (like server-1, server-2, server-3).
If user pushes the pack to server-1, then server-1 will call "git push" to transfer the objects data to server-2 and server-3.
And users can clone from all the server mentioned above.
Under such a process, our system works well most of the time.

If we set pack.useBitmaps=false, "git upload-pack" will also be affected.
For my situation, it would be sufficient when set both pack.useBitmaps=true and push.useBitmaps=false.

> But I suspect that in this case the bitmaps are just simply stale, and
> that a "git repack -adb" or more aggressive background maintenance would
> make things quite a bit faster.

It doesn't seem to be the reason.
I have already called  "git repack -adb" in this case[1] but that didn't seem to fix the push performance issue.
You can see that the git repo had only one pack at that time.

[1] https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/

Thanks,
- Kyle
diff mbox series

Patch

diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
index e32801e6c91..d8fb0bd1414 100644
--- a/Documentation/config/push.txt
+++ b/Documentation/config/push.txt
@@ -137,3 +137,7 @@  push.negotiate::
 	server attempt to find commits in common. If "false", Git will
 	rely solely on the server's ref advertisement to find commits
 	in common.
+
+push.useBitmaps::
+	If this config and `pack.useBitmaps` are both "true", git will
+	use pack bitmaps (if available) when git push. Default is false.
\ No newline at end of file
diff --git a/send-pack.c b/send-pack.c
index bc0fcdbb000..d6091571caa 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -84,6 +84,8 @@  static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
 		strvec_push(&po.args, "--progress");
 	if (is_repository_shallow(the_repository))
 		strvec_push(&po.args, "--shallow");
+	if (!args->use_bitmaps)
+		strvec_push(&po.args, "--no-use-bitmap-index");
 	po.in = -1;
 	po.out = args->stateless_rpc ? -1 : fd;
 	po.git_cmd = 1;
@@ -482,6 +484,7 @@  int send_pack(struct send_pack_args *args,
 	int use_push_options = 0;
 	int push_options_supported = 0;
 	int object_format_supported = 0;
+	int use_bitmaps = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -497,6 +500,9 @@  int send_pack(struct send_pack_args *args,
 	git_config_get_bool("push.negotiate", &push_negotiate);
 	if (push_negotiate)
 		get_commons_through_negotiation(args->url, remote_refs, &commons);
+	git_config_get_bool("push.usebitmaps", &use_bitmaps);
+	if (use_bitmaps)
+		args->use_bitmaps = 1;
 
 	git_config_get_bool("transfer.advertisesid", &advertise_sid);
 
diff --git a/send-pack.h b/send-pack.h
index e148fcd9609..f7af1b0353e 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -26,7 +26,8 @@  struct send_pack_args {
 		/* One of the SEND_PACK_PUSH_CERT_* constants. */
 		push_cert:2,
 		stateless_rpc:1,
-		atomic:1;
+		atomic:1,
+		use_bitmaps:1;
 	const struct string_list *push_options;
 };
 
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index dedca106a7a..ee0b912a5e8 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1865,4 +1865,15 @@  test_expect_success 'push warns or fails when using username:password' '
 	test_line_count = 1 warnings
 '
 
+test_expect_success 'push with config push.useBitmaps' '
+	mk_test testrepo heads/main &&
+	git checkout main &&
+	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&
+	grep "no-use-bitmap-index" stderr &&
+
+	git config push.useBitmaps true &&
+	GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr &&
+	! grep "no-use-bitmap-index" stderr
+'
+
 test_done