diff mbox series

promisor-remote: add promisor.quiet configuration option

Message ID 20240523131926.1959245-1-tom@compton.nu (mailing list archive)
State Superseded
Headers show
Series promisor-remote: add promisor.quiet configuration option | expand

Commit Message

Tom Hughes May 23, 2024, 1:19 p.m. UTC
Add a configuration optione to allow output from the promisor
fetching objects to be suppressed/

This allows us to stop commands like git blame being swamped
with progress messages and gc notifications from the promisor
when used in a partial clone.

Signed-off-by: Tom Hughes <tom@compton.nu>
---
 Documentation/config.txt          |  2 ++
 Documentation/config/promisor.txt |  3 ++
 promisor-remote.c                 |  3 ++
 t/t0410-partial-clone.sh          | 47 +++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)
 create mode 100644 Documentation/config/promisor.txt

Comments

Junio C Hamano May 23, 2024, 10:23 p.m. UTC | #1
Tom Hughes <tom@compton.nu> writes:

> Add a configuration optione to allow output from the promisor
> fetching objects to be suppressed/

"optione" -> "option", "suppressed/" -> "suppressed.".

> This allows us to stop commands like git blame being swamped
> with progress messages and gc notifications from the promisor
> when used in a partial clone.

"git blame" -> "'git blame'", perhaps.

It is an interesting observation.  I thought "git blame" was quite
bad at streaming (i.e., until it learned the origin of each and
every line, it never produced any output the user asked for), which
actually would make it a non issue that the output the user wanted
gets mixed with the progress messages and other garbage.  Unless the
user understands that "git blame" is not spending time itself, but
is waiting for necessary blobs to be fetched from the promisor, and
is expected to wait unusally longer than the fully local case,
having to stare at a blank/unchanging screen would make it uneasy
for the end-user and that is why we have progress eye-candy.

I am OK for promisor.quiet being optional, but I am torn when I
imagine what comes next.  On one hand, I myself probably would find
it neat to make these lazy fetches happen completely silently as if
nothing strange is happening from the point of view of end-users
(except for some operations may be unusually slow compared to fully
local repository).  On the other hand, I suspect people will be
tempted to push it to be on by default at which time it may hurt
unsuspecting (new) users who may have been helped by progress bars.

> diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> new file mode 100644
> index 0000000000..98c5cb2ec2
> --- /dev/null
> +++ b/Documentation/config/promisor.txt
> @@ -0,0 +1,3 @@
> +promisor.quiet::
> +	If set to "true" assume `--quiet` when fetching additional
> +	objects for a partial clone.

OK.

> diff --git a/promisor-remote.c b/promisor-remote.c

The implementation is absolutely trivial and straight-forward.

> +test_expect_success TTY 'promisor.quiet=false works' '

Do not say "works"---recall the best practice of writing a good bug
reports.  Stating your expectation more explicitly, e.g. "shows
progress messages" or somesuch.

> +	rm -rf server server2 repo &&
> +	rm -rf server server3 repo &&

Why remove the same thing twice?

> +	test_create_repo server &&
> +	test_commit -C server foo &&
> +	git -C server repack -a -d --write-bitmap-index &&
> +
> +	git clone "file://$(pwd)/server" repo &&
> +	git hash-object repo/foo.t >blobhash &&

Do you need a temporary file, or would

	blobhash=$(git hash-object repo/foo.t) &&

work just fine?  Of course you'd later have to say

	... git -C repo cat-file -p $blobhash

instead of "$(cat blobhash)".  Even simpler, I wonder if you can
remove this hash-object invocation, and then do

	... git -C repo cat-file -p :foo.t


> +	rm -rf repo/.git/objects/* &&

This! IS! BAD! for the reason stated later ...


> +	git -C server config uploadpack.allowanysha1inwant 1 &&
> +	git -C server config uploadpack.allowfilter 1 &&

... but these are OK and expected, ...

> +	git -C repo config core.repositoryformatversion 1 &&
> +	git -C repo config extensions.partialclone "origin" &&

... and this is way too different from what would happen in the real
life.

I'd prefer not to see manual destruction of $GIT_DIR/objects/* or
manual futzing of repository format version and extensions.  These
configuration variables are *NOT* for end-users to futz with, and
the tests should not be doing so either.

Can't we prepare the "repo" only by creating a partial clone in an
usual way?

> +	git -C repo config promisor.quiet "false" &&

This of course is good, as this is what the test wants to check.

> +	test_terminal git -C repo cat-file -p $(cat blobhash) 2>err &&

It seems that exactly the same set of comments apply to the next
one, so I'll refrain from repeating myself.

Thanks.

> +test_expect_success TTY 'promisor.quiet=true works' '
> +	rm -rf server server2 repo &&
> +	rm -rf server server3 repo &&
> +	test_create_repo server &&
> +	test_commit -C server foo &&
> +	git -C server repack -a -d --write-bitmap-index &&
> +
> +	git clone "file://$(pwd)/server" repo &&
> +	git hash-object repo/foo.t >blobhash &&
> +	rm -rf repo/.git/objects/* &&
> +
> +	git -C server config uploadpack.allowanysha1inwant 1 &&
> +	git -C server config uploadpack.allowfilter 1 &&
> +	git -C repo config core.repositoryformatversion 1 &&
> +	git -C repo config extensions.partialclone "origin" &&
> +	git -C repo config promisor.quiet "true" &&
> +
> +	test_terminal git -C repo cat-file -p $(cat blobhash) 2>err &&
> +
> +	# Ensure that no progress messages are written
> +	! grep "Receiving objects" err
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd
Tom Hughes May 24, 2024, 8:31 a.m. UTC | #2
On 23/05/2024 23:23, Junio C Hamano wrote:

> It is an interesting observation.  I thought "git blame" was quite
> bad at streaming (i.e., until it learned the origin of each and
> every line, it never produced any output the user asked for), which
> actually would make it a non issue that the output the user wanted
> gets mixed with the progress messages and other garbage.  Unless the
> user understands that "git blame" is not spending time itself, but
> is waiting for necessary blobs to be fetched from the promisor, and
> is expected to wait unusally longer than the fully local case,
> having to stare at a blank/unchanging screen would make it uneasy
> for the end-user and that is why we have progress eye-candy.

Blame actually has it's own progress message that counts the
number of lines analysed which gets interrupted by the progress
messages from the promisor.

Something like "git log -S" behaves a bit differently - it doesn't
have progress and because it's using a pager by default that causes
the promisor progress to be suppressed because stderr is no longer
a terminal but you do still get lots of background gc notifications.

> I am OK for promisor.quiet being optional, but I am torn when I
> imagine what comes next.  On one hand, I myself probably would find
> it neat to make these lazy fetches happen completely silently as if
> nothing strange is happening from the point of view of end-users
> (except for some operations may be unusually slow compared to fully
> local repository).  On the other hand, I suspect people will be
> tempted to push it to be on by default at which time it may hurt
> unsuspecting (new) users who may have been helped by progress bars.

I do agree that it's hard to know what the right thing to do is here
or even to know the full scope of the effect.

I'll update the patch to address the specific review comments.

Tom
Jeff King May 25, 2024, 5:29 a.m. UTC | #3
On Thu, May 23, 2024 at 02:19:26PM +0100, Tom Hughes wrote:

> Add a configuration optione to allow output from the promisor
> fetching objects to be suppressed/
> 
> This allows us to stop commands like git blame being swamped
> with progress messages and gc notifications from the promisor
> when used in a partial clone.

I'm not at all opposed to providing a way to suppress this, but I feel
like in the long run, the more fundamental issue is that git-blame kicks
off a zillion fetches as it traverses. That's not only ugly but it's
also horribly inefficient.

In an ideal world we'd queue all of the blobs we need, do a single
fetch, and then compute the blame on the result. That's probably easier
said than done, though we have done it in other spots (e.g., for
checkout).

In terms of user experience, you can simulate it with something like:

  # fault in all of the necessary blobs in one batch
  git rev-list HEAD -- $file |
  git diff-tree --stdin --format= -r --diff-filter=d -m --raw -- $file |
  awk '{print $4}' |
  git -c fetch.negotiationAlgorithm=noop \
      fetch --no-tags --no-write-fetch-head --recurse-submodules=no \
      --filter=blob:none --stdin

  git blame $file

Obviously that command is horrid and not something users should have to
care about. But if we had some way for blame to say "hey, I am
traversing from X..Y, looking at these pathspecs", then our first
lazy-fetch could try to grab all of them. And I think the same would be
the case for "git log -p", and so on.

Doing a separate traversal isn't maximally efficient, but it might not
be too bad in practice (and we could even do partial traversals to
balance chunking versus responsiveness, though in the case of
non-incremental blame we need everything before we generate an
answer anyway).

But anyway, I bring it up here because I think once we reach that end
state, it won't be as interesting to turn off the fetch progress.

-Peff
Tom Hughes May 25, 2024, 10:29 a.m. UTC | #4
On 25/05/2024 06:29, Jeff King wrote:
> On Thu, May 23, 2024 at 02:19:26PM +0100, Tom Hughes wrote:
> 
>> Add a configuration optione to allow output from the promisor
>> fetching objects to be suppressed/
>>
>> This allows us to stop commands like git blame being swamped
>> with progress messages and gc notifications from the promisor
>> when used in a partial clone.
> 
> I'm not at all opposed to providing a way to suppress this, but I feel
> like in the long run, the more fundamental issue is that git-blame kicks
> off a zillion fetches as it traverses. That's not only ugly but it's
> also horribly inefficient.

This is true. One thing I found that makes things a lot more
efficient if you're using ssh as the transport is to enable
persistent multiplexing in .ssh/config with something like:

Host git.example.com
   ControlMaster auto
   ControlPath /run/user/%i/ssh/control.%C
   ControlPersist 1m
   SendEnv GIT_PROTOCOL

which avoids each fetch having to setup and authenticate a
new ssh session.

> In an ideal world we'd queue all of the blobs we need, do a single
> fetch, and then compute the blame on the result. That's probably easier
> said than done, though we have done it in other spots (e.g., for
> checkout).

That would certainly be an excellent improvement, yes.

Tom
Jeff King May 29, 2024, 9:36 a.m. UTC | #5
On Sat, May 25, 2024 at 11:29:13AM +0100, Tom Hughes wrote:

> > I'm not at all opposed to providing a way to suppress this, but I feel
> > like in the long run, the more fundamental issue is that git-blame kicks
> > off a zillion fetches as it traverses. That's not only ugly but it's
> > also horribly inefficient.
> 
> This is true. One thing I found that makes things a lot more
> efficient if you're using ssh as the transport is to enable
> persistent multiplexing in .ssh/config with something like:
> 
> Host git.example.com
>   ControlMaster auto
>   ControlPath /run/user/%i/ssh/control.%C
>   ControlPersist 1m
>   SendEnv GIT_PROTOCOL
> 
> which avoids each fetch having to setup and authenticate a
> new ssh session.

Good point. That is sort of the opposite approach of my suggestion. That
is, I was suggesting that git-blame batch everything to make a single
efficient request. But if we could reduce the cost of making individual
requests, then we wouldn't need to batch (which is quite a lot simpler).

The ssh session is going to be one source of latency and overhead. But
just spawning the fetch and remote upload-pack are another (especially
if you have to authenticate, and especially with the v2 protocol, which
has an extra round-trip for capabilities upgrade).

If there was a long-running mode to git-fetch where it kept open a v2
session to the server and just said "hey, send me object X" and then
"OK, now send me object Y" that would eliminate all of that overhead
(and even for http, under the hood curl is good at keeping the session
open between requests).

You'd still have some extra latency (while you're talking to the server,
the local blame process is paused), but I suspect it would be a lot more
tolerable.

And now your progress question is re-opened again. You might want a
more succinct progress for something like blame that still does all of
its fetching before generating output. E.g., you might want a single
progress line with the current state (fetching or not), the count of
fetched objects, the speed, and so on. And for something like "git log
-p", where the progress would be interspersed with actual output, you
might want to suppress it entirely.

So yeah, I have no real objection to what your patch is doing. Depending
on how future work unfolds it might be more or less useful than it is
now, but even in the worst case it probably won't be a bad thing to have
in our toolbox.

-Peff
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 70b448b132..6cae835db9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -487,6 +487,8 @@  include::config/pager.txt[]
 
 include::config/pretty.txt[]
 
+include::config/promisor.txt[]
+
 include::config/protocol.txt[]
 
 include::config/pull.txt[]
diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
new file mode 100644
index 0000000000..98c5cb2ec2
--- /dev/null
+++ b/Documentation/config/promisor.txt
@@ -0,0 +1,3 @@ 
+promisor.quiet::
+	If set to "true" assume `--quiet` when fetching additional
+	objects for a partial clone.
diff --git a/promisor-remote.c b/promisor-remote.c
index b414922c44..2ca7c2ae48 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -23,6 +23,7 @@  static int fetch_objects(struct repository *repo,
 	struct child_process child = CHILD_PROCESS_INIT;
 	int i;
 	FILE *child_in;
+	int quiet;
 
 	if (git_env_bool(NO_LAZY_FETCH_ENVIRONMENT, 0)) {
 		static int warning_shown;
@@ -41,6 +42,8 @@  static int fetch_objects(struct repository *repo,
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
 		     "--filter=blob:none", "--stdin", NULL);
+	if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
+		strvec_push(&child.args, "--quiet");
 	if (start_command(&child))
 		die(_("promisor-remote: unable to fork off fetch subprocess"));
 	child_in = xfdopen(child.in, "w");
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 88a66f0904..99257c3792 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -3,6 +3,7 @@ 
 test_description='partial clone'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-terminal.sh
 
 # missing promisor objects cause repacks which write bitmaps to fail
 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
@@ -689,6 +690,52 @@  test_expect_success 'lazy-fetch when accessing object not in the_repository' '
 	! grep "[?]$FILE_HASH" out
 '
 
+test_expect_success TTY 'promisor.quiet=false works' '
+	rm -rf server server2 repo &&
+	rm -rf server server3 repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	git hash-object repo/foo.t >blobhash &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C server config uploadpack.allowanysha1inwant 1 &&
+	git -C server config uploadpack.allowfilter 1 &&
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "origin" &&
+	git -C repo config promisor.quiet "false" &&
+
+	test_terminal git -C repo cat-file -p $(cat blobhash) 2>err &&
+
+	# Ensure that progress messages are written
+	grep "Receiving objects" err
+'
+
+test_expect_success TTY 'promisor.quiet=true works' '
+	rm -rf server server2 repo &&
+	rm -rf server server3 repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	git hash-object repo/foo.t >blobhash &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C server config uploadpack.allowanysha1inwant 1 &&
+	git -C server config uploadpack.allowfilter 1 &&
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "origin" &&
+	git -C repo config promisor.quiet "true" &&
+
+	test_terminal git -C repo cat-file -p $(cat blobhash) 2>err &&
+
+	# Ensure that no progress messages are written
+	! grep "Receiving objects" err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd